Skip to content

"Trends and Information on AI, Big Data, Data Science, New Data Management Technologies, and Innovation."

This is the Industry Watch blog. To see the complete ODBMS.org
website with useful articles, downloads and industry information, please click here.

Jun 15 26

Trust Is Not a Feeling: Nuno Galante Valério on Engineering Accountability into AI for High-Stakes Healthcare

by Roberto V. Zicari

“On Innovation” series

“The way most AI conversations use “trust,” it names a feeling – and you can’t engineer a feeling.”

Q1. What do the builders of AI consistently fail to understand about deploying their work in a GxP environment, where the cost of being wrong is measured in patient safety?

Nuno Galante Valério: If I have to choose one thing: they don’t feel the distance between a demo that works and a system you can deploy. That distance is the entire job, where the whole effort is. It’s where I’ve spent my career.

I’ve sat through this meeting many times: a vendor, or one of our own teams, shows me something that genuinely impresses the room. The model reads a batch record, finds the deviation, drafts the CAPA, and does it faster and more carefully than the person who used to. Someone says the word “production-ready,” and means it. So, I ask them to run it again, same input. They do, and what comes back is almost the same. A sentence in a different order. A risk worded a little differently. A reference that was there the first time and, the second time, quietly isn’t. The mood in the room changes, because everyone understands at once that “almost the same” is not something you can write into a validation report, and put your name under.

Now, the easy lesson to draw from that room is the wrong one – that generative systems are too unstable to let near anything that matters. Europe’s first instinct, in its draft guidance for AI in manufacturing, was close to that: keep these models away from critical operations. The part I find genuinely interesting is that the direction is already moving off it, toward a risk-based view, and I think that correction is right. It turns on a distinction the builders almost never start from: risk is a property of the function, not of the technology. A frozen, deterministic model making a release decision with nobody checking it is more dangerous than a probabilistic one drafting something a qualified person reviews before it goes anywhere. The variation I provoked in the room was never the hazard; the hazard is letting any output, stable or not, reach a place you can’t walk it back from, without a control built to catch it. It’s why, when my team sizes up an AI use, the first questions aren’t about the model at all – they’re how critical the function is, how much the thing decides on its own, and whether we’d even notice it going wrong.

Here is what the builders are actually missing, and they miss it because everything in their world rewards them for missing it. They optimize for capability — can the system do the task, well, fast. The regulated world doesn’t start there. It starts somewhere stranger: can you tell me, in advance and in writing, the edge of what this thing will do, so that inside the edge I’m never surprised, and outside it I can prove I had something in place to catch it. And the failure that keeps me awake isn’t the one the demo shows. The demo shows what the model catches. I’m paid to worry about what it misses, because a miss in my world doesn’t raise its hand. A false alarm announces itself and someone investigates; a missed signal just sits there, looking like nothing happened.

So, the failure isn’t really technical. Most of these people are far better engineers than I’ll ever be. What they haven’t done, what they’ve never been asked to do, is be the person whose name goes on the line that says I am accountable for what this does in front of a patient, and for what it fails to do. If you’ve never had to sign that, “it works” feels like the finish. Once you have, “it works” is maybe halfway, and the easy half. The other half has no demo in it. It’s building the argument for why the risk that remains is acceptable, and then defending that argument to an inspector whose job is to assume you got it wrong.

I don’t say this to be hard on them; you can’t really know it until you’ve lived it. I say it because the most interesting work in the field right now is sitting in that gap, between “it works” and “I’d stake my name on it”; and almost nobody upstream has noticed the gap is even there.

Q2. Give us a concrete example where the governance process was itself the site of genuine innovation – where something was invented that would not have existed without it.

Nuno Galante Valério: The honest, real version of this starts with a failure, because the useful thing came out of the failure itself.

We had a system – document-grounded, retrieval-based, the kind that answers a quality question by pulling from a controlled procedure corpus rather than from the model’s own memory. By every measure we had, it passed. Retrieval was solid, the prompts frozen, the version pinned, the test cases green. The validation evidence was complete. And as the process owner, I wouldn’t give my sign-off. Not because I could point at a defect (I couldn’t, the validation was clean) but because “the protocol passed” and “I’ll stand behind this running in my process for the next eighteen months” are not the same statement, and the second one is what my signature actually carries.

Sitting in that gap, is what sent me recently to Petri Pohjanen. He’d spent years in automotive functional safety – ISO 26262, the world where software steers a moving car and a wrong output is a crash (not a typo) – and he’d held release authority, so he had personally signed the kind of statement I was hesitating over. Automotive had already solved, twenty years ago, a version of the exact thing I was stuck on: how do you take responsibility for a system you can’t test exhaustively. Their answer was never to make it deterministic. It was the safety case: a structured, layered argument that the risk of failure that remains is low enough to accept, with evidence under each layer. I’d been trying to discharge with a test report, something that was only ever going to yield to an argument.

What came out of working together we called the Layered Assurance Stack; work that Petri and I are still developing in the open. Three layers that deliberately don’t collapse into one another. The first is what the system is allowed to do in the first place. The second is how it can fail in ways that have nothing to do with a broken component  (this is where automotive’s SOTIF thinking carries over, the failures that come not from a part breaking but from the system meeting a situation outside the assumptions it was designed around). The third, is what has to exist inside the organization to catch those failures, while it’s running. Run the three together, and you get a proportionality result: how much assurance this particular use, in this particular context, actually needs. We gave the result a name and a set of tiers, but honestly the name is the least interesting part of it. The moment you name a tier, people start treating it as a standard instead of as the answer to a question, and the thinking stops.

Here’s the part that wouldn’t exist without the governance problem forcing it. What pharma was missing was never a better test. It was a language for arguing about probabilistic systems that an auditor can actually follow. The field had two reflexes: set the temperature to zero, and pretend you’ve made the thing deterministic; or refuse to deploy at all – and both are answers to a question nobody should be asking. 

There was nothing in between, so we had to build the in-between. And the only reason we could is that I’d hit a wall where my existing tools told me a system was fine and my own judgment told me it wasn’t, and I refused to settle that, by trusting the tools over the judgment.

The cost of it, since this series is about honesty and not press releases: it’s slow. It needs a certain organizational maturity. It needs you to disagree, sometimes sharply, with people you respect. And it needs patience to build at any real scale. The vocabulary is further along than the adoption, right now. Closing that distance is the part still in front of me and many of my peers.


Q3. ICH E6(R3) and the broader GxP framework assume deterministic, validated software. Generative AI is probabilistic and non-deterministic. How are you and your peers actually handling that tension in practice – not in principle?

Nuno Galante Valério: In practice, it gets handled by moving what you validate, which is a far quieter answer than the public debate would suggest.

The initial instinct is to ask how you validate the model. That question has no good answer, because the model is the part that won’t hold still. So, the people doing this seriously validate something else: the process made of a human and a system together, with the model sitting inside a control envelope as one component, rather than being the thing on trial. You don’t qualify the language model. You qualify the workflow around it – a person of defined competence reviewing the output against a defined standard, with the boundaries written down and the failure modes named before you start. The model is allowed to be probabilistic, as long as the process containing it is controlled. And that isn’t a dodge. It’s the same move we’ve always made with people: we never validated the analyst’s mind; we validated the procedure the analyst worked inside, because the analyst was fallible too and we knew it.

The second shift is harder, and it’s the one really unsettling – the move from validating at a point in time, to monitoring over time. Classical validation works as a photograph. You show the system was right on that day you tested it, then you freeze it. But there’s a thread in the interpretability research, Anthropic’s among it, about the gap between the reasoning a model states and the computation it actually performs. Take that seriously enough, and the photograph stops meaning much. If a system can drift, and if the reasons it gives you aren’t reliably the reasons it acted on, then proving it was correct on day one tells you very little about day two hundred. Validation has to become something closer to surveillance. You’re not proving correctness once; you’re sampling for it, continuously, against a population of inputs that keeps moving under you.

That points at a role with no name yet, which I think is the single most important unbuilt thing in the field. Some hybrid of quality assurance and data science – a person who can read a control chart and a model card with equal fluency, who watches a production AI system the way a process engineer watches a control strategy. That person isn’t on the pharma org chart, yet. The data scientists rarely think in GxP (actually, often avoid it) and the quality people rarely think in distributions, so whoever holds both frames at once, has usually arrived there by accident. Somebody is going to have to build that into a profession on purpose.

So, the honest answer to “how are you handling it”: imperfectly, and by learning as we go. The frameworks haven’t caught up, so for now it’s people building the bridge while they’re standing on it. Uncomfortable. It’s also, I’d argue, the fastest way to find out what the bridge actually has to carry.


Q4. You lead a “trust architecture” for AI in GxP. What does trust actually mean as an engineering requirement – how do you decompose it into properties that can be specified, tested, monitored, and maintained?

Nuno Galante Valério: I’d start by taking the word back from itself, because the way most AI conversations use “trust,” it names a feeling – and you can’t engineer a feeling. What you can engineer are the conditions that make the feeling unnecessary. A patient swallows a tablet without auditing the supply chain behind it. Not because they’ve decided to believe in it, but because a century of architecture has already absorbed the complexity, so they don’t have to. That absorbed, invisible structure is what trust actually is, once you stop treating it as an emotion. And notice where it lives: not in the tablet, but in everything standing behind it. With AI it’s the same, and it’s the whole reason I named the work the way I did: the trust that matters was never going to live inside the model. It lives in what you build around it.

So, the question I work on is: what does a system have to do, structurally, before it earns that kind of invisibility. Looking across pharmaceutical regulation, aviation, banking, nuclear, food safety, the blood supply, the machinery of courts and professions – seven functions kept reappearing. Not because they’re the only things present in any one regime, but because their absence is what turns up in the post-mortem,whenever trust collapses. Thalidomide was a surveillance failure. The 2008 Crisis was a failure of provenance and verifiability. Tuskegee – men left untreated for a disease that had a cure – was a failure of recourse.. Each one fails in its own characteristic way, and the mature version of every trust regime is, if you look closely, the scar tissue from once having been missing that function.

The seven are provenance, verifiability, accountability, reversibility, legibility, recourse, and surveillance. Rather than march through all seven, I’ll share how they group, because the grouping is what does the work. Provenance and verifiability are the is-it-what-it-claims pair: can you trace every component to its origin, and can someone not aligned with the maker check the claims independently. For most production AI in 2026, the honest answer to both is “not really” – we often cannot say who labelled the training data, or under what consent, and frontier evaluation is largely self-reported by the lab that trained the model, on benchmarks it partly designed. Accountability and reversibility are the can-it-be-answered-for-and-undone pair. Legibility and recourse are the can-the-affected-human-see-it-and-get-a-remedy pair. And surveillance stands alone – the population-level function, that catches the slow, aggregate harm that no single user would ever notice in themselves.

People ask why seven, and not five or nine. Because seven is the smallest set that survives a comparative test. Drop one and you find you’ve fused two functions that do genuinely different jobs; add one and you’ve split a function into halves that were never really independent. I’m not claiming it’s the only taxonomy anyone could draw. I’m just claiming you can’t remove a piece without losing something you needed, or add one without repeating yourself. That’s a falsifiable claim, which is the most I can honestly offer – and I’d be glad to be proven wrong.

Where it gets interesting is that GxP doesn’t weight the seven evenly. The three that pharma tends to underbuild are, awkwardly, the three that decide whether AI is deployable at all.

Surveillance is the one the non-determinism question kept circling. Point-in-time qualification is just a photograph; a system that can drift needs continuous monitoring against a population that moves. Pharma already knows how to do this for drugs – it’s called pharmacovigilance. It just hasn’t started doing it for models.

Reversibility almost nobody builds, and in a regulated setting it’s unforgiving, because so many of the actions an AI touches can’t be taken back. You can recall a batch. You cannot easily un-make a decision that’s already propagated into a regulatory submission or a patient’s record. So, reversibility here is less an “undo button” and more a question: “is there a containment boundary that catches a wrong output before it becomes irreversible”. That’s a design property, it costs money, and it’s usually the first to be cut when a team is chasing capability.

And recourse is the one the engineering-minded want to leave out, and the one I can’t let them. When the system is wrong about something that matters, is there a path for the human to remediate it. A system can be perfectly provenanced, verifiable, accountable, reversible, legible, and surveilled, and still be untrustworthy if being wrong about you carries no fixing. Recourse is the function that remembers there is a person at the end of all this, not just a number or metric. It’s also the one with no clean home, in most architectures; which is exactly why it goes missing.

Decomposed this way, trust stops being a vibe in a vendor pitch (that truly doesn’t help anyone) and becomes a set of functions you can specify, assign owners to, test against, and audit. The work of a trust architecture is exactly that translation – taking a word everyone nods at (and instinctively understands), and turning it into seven things someone has to be accountable for. The moment trust has an owner and a test, it isn’t a feeling anymore. It’s engineering. 


Q5. Cerf, Kay, Stroustrup, Booch built foundations others stand on. You’re building the governance and trust infrastructure that decides whether AI can stand on those foundations in one of the highest-stakes domains there is. Looking at the next decade – what needs to be built that doesn’t yet exist, without which the most important AI applications in medicine simply won’t be deployable at scale?

Nuno Galante Valério: Two things. The second is much harder than the first, and almost no one is working on it.

The first is a regulatory science that can reason about distributions, not just instances. Our whole evidentiary tradition rests on the qualified instance: this system, tested, frozen, proven. What we need is a science, that knows how to accept evidence of a different shape: this system stays within acceptable bounds, across a whole population of inputs, monitored continuously, with these statistical guarantees. That’s a different standard of proof. Regulators are edging toward it – the FDA’s predetermined change control thinking, the EMA’s Annex 22 work – but edging toward something isn’t the same as having it. Until an inspector can be trained on what “good” looks like, for a monitored probabilistic system, every deployment is negotiated from scratch, and you can’t scale a thing that has to be negotiated every single time.

The second, is the one I actually care about, and the hardest. We need governance that can hold disagreement, without collapsing it. Nearly every framework I know, the good ones included, and mine included, works by reducing a complex system to a single verdict: approved, or classified, or certified, take your pick. One number, one answer. But the systems we govern now don’t have a single answer inside them. A model can be safe for one use, and a hazard in the one next to it. It can be defensible to one stakeholder, and unaccountable to another. It can be right on average, and catastrophic in a certain use case. Force all of that into one verdict, and you haven’t governed the complexity, you’ve basically hidden it. What we don’t have yet – in standards, in regulatory science, in how we design organizations – is a way to hold several legitimate, competing assessments at once, and stay coherent without flattening or averaging them. I’ve come to see that less as a compliance problem than as an architecture problem, which is why I think it’s the one that actually decides whether the important applications ship.

Which is the thread running under all of your questions, and the thing they keep nearly asking. So let me say it plainly in the next question, since you’ve left me the room to do it.


Qx. Having answered these, what’s the one thing you most wanted to say – about governance, about trust, about what innovation looks like from inside a regulated environment – that none of the questions gave you the right opening to say?

Nuno Galante Valério: That the hardest problem in AI governance isn’t technical, and the reason the field keeps treating it as though it were, is that we inherited our instincts from a generation of builders who worked in a world that behaved the same way twice.

The foundations your series has documented – the protocols, the languages, the methods – share a property so deep, that it’s almost invisible: they’re deterministic. Same input, same output, every time. That isn’t incidental to how Cerf or Stroustrup think; it’s the ground they built on, and it’s a magnificent ground. It made software something you could reason about, prove things about, trust. The entire apparatus I work inside – validation, qualification, the regulated assurance of software – is downstream of that same assumption. Trust meant predictability, and predictability meant the thing had a single, stable, knowable behaviour.

The systems we’re building now don’t have that. A generative model has no single stable behaviour to validate – it has a distribution of behaviours, some excellent, some dangerous, none of them sovereign over the others. And here is what I’ve come to believe, and what I most wanted to say: this isn’t a defect we’ll engineer away. It’s the nature of the thing, and it’s the same nature that shows up the moment you look at any sufficiently complex system that has to act in the world. An organization is not a single coherent decision-maker; it’s a contest of legitimate, competing internal claims that somehow has to produce one decision. A regulatory regime is a parliament, not a person. Even a single expert under pressure is rarely one unified voice – they’re a negotiation. We have spent a century pretending these things are unitary, because unitary things are easier to hold accountable. The pretence is now breaking, because the technology we’ve built is the first one that refuses to perform the unity.

So, the governance problem I think actually matters – more than any specific standard or framework, including my own – is this: how do you make a thing trustworthy when it cannot be made to govern itself from the inside. The deterministic answer was always “constrain it until its behaviour is single and predictable.” That answer is exhausted. It doesn’t work on models, and if we’re honest, it never really worked on institutions either; we just had the luxury of pretending and it was still mostly ok. The answer that does work, is architectural. You stop trying to force the internal multiplicity into a single obedient self, and you build the external structure – provenance, verifiability, surveillance, recourse – that lets a system which is genuinely plural on the inside, still be answerable on the outside. You govern the multiplicity, instead of denying it.

This is why I think people one step downstream of the technology – in the regulated trenches, where the cost of being wrong is a patient and not a metric – have something to contribute that the foundation-builders and frontier labs, for all their brilliance, are not positioned to see. They built a world that holds still. We’re learning to govern one that doesn’t. The governance of multiplicity – holding many competing, legitimate voices accountable without flattening them into one false answer – is, I’m increasingly convinced, the same problem at every scale: inside a model, inside an organization, inside a regulatory regime. Get it right in one place and you’ve learned something about all of them.

I’ll admit I didn’t arrive at that view purely from the regulatory work. It’s the kind of conviction you reach the long way around, through more than one part of a life. But the questions were generous enough to give it a professional home, and that’s the version worth putting on the record.

So, that’s the thing none of the questions asked. Thank you for the room to say it.

………………………………………………………………….

Nuno Valério is Head of Innovation for R&D Quality at Merck Healthcare in Darmstadt, where he leads AI governance for GxP-regulated pharmaceutical environments. A clinical pharmacist by training (MSc, Universidade de Coimbra), he has spent twelve years at Merck, moving from compliance into digital innovation leadership. He is the author of Trust Architecture, a seven-function framework — provenance, verifiability, accountability, reversibility, legibility, recourse, and surveillance — for making probabilistic AI systems trustworthy enough to deploy at scale. He writes the Trust Architecture newsletter and speaks regularly on what it takes to treat trust as something you engineer rather than something you simply feel.

……………………………..

Follow us on X

Follow us on LinkedIn

Jun 9 26

The Cost of Getting It Wrong: Ivan Santa Maria Filho on Building AI Systems That Hold Up in Production

by Roberto V. Zicari

“By picking the hard problems I found my people. Colleagues, advisors, mentors, and leaders I follow, and people I help. “

Q1. In your previous conversation with ODBMS Industry Watch (*), you described BigFrames as a “promise of a data frame” — a lazy evaluation model that defers execution to let BigQuery’s optimizer combine and reduce operations before they run.

In the context of AI workloads specifically, can you walk us through a concrete successful example where that lazy evaluation produced a meaningfully better outcome — in cost, performance, or correctness — than an eager approach would have? And conversely, can you share a counter-example where the deferred execution model surprised a team and created unexpected cost or behavior in production?

Ivan Santa Maria Filho: BigFrames helps here, but it is not the main protagonist. It allows users to express what they want in data frame terms, optimizes them in a plan, which is then passed to BigQuery, which  optimizes it again using its regular query optimizer.

BigFrames optimizes the tasks, for instance it might replace some queries by a table scan via store APIs. BigQuery might create a proxy model and replace LLM calls. Proxy models cost as little as 1% of a regular LLM call, and are much faster. You can learn about proxy models on BigQuery’s blog.

BigFrames also supports user defined functions (UDFs), both hosted by Cloud Run, or fully managed, a feature that just reached general availability. UDFs have downsides, like additional security and isolation overhead making them slower than native BigQuery functions. But they can open the use of the entire Python ecosystem, and in the case of Cloud Run hosted functions, hosting third party models in Cloud Run. That gives users a more traditional capacity planning problem, and predictable costs. 

The AI specific issues that worry me the most are security issues, hallucinations, and cost surprises. 

I dislike charging by token count, as users don’t control the number of tokens they generate.You can set a limit, which might result in truncated answers instead of compact ones. If using a reasoning model, you typically won’t control what they exchange with each other, but still pay for it. 

Hallucinations are just how LLMs without additional reasoning and tools work. Andrej Karpathy is a much better explainer than I am, so I recommend looking him up on YouTube. That said, I want to share an intuitive explanation.

User prompts are converted from words to floating point vectors using algorithms like CBOW (“continuous bags of words”) and skip-grams. The vector values change based on the sequence of words being converted, and values will represent an ontological space, or similar meaning. A word like bank would have different values when in a sentence like “what is the typical withdrawal limit for banks ATMs?”, and “what kind of banks does the Mississippi river have?”.

The floating point vectors are then fed to something similar to a Transformer, which uses the attention layer to model relationships, plus find the most likely set of words to follow the input sequence among texts that use the same meaning it inferred to the words on your input. It then feeds the output token as input and estimates the next. It does that until an exit criteria is met, and that is the answer you get. 

That answer might then be fed to another model for validation, and maybe a loop of exchanges form. Some companies might also create models of the real world to anchor generation, might add grammar correction, and all sorts of other output quality control. 

Despite all the efforts in the industry, users still manage to create inputs (prompts) that will cause the LLM to yield nonsense like fake names, or fake articles, or fake source materials because it started with a word that would likely be the next, then fed that word as input, which can take the LLM down a path that makes no sense. 

That is how they hallucinate research that never existed, but makes sense as an abstract, from very real scientists that might work on a related area. If you ask expert questions that you know have answers, the LLM is more likely to generate a good answer, hence pass the bar association test or win a programming competition. Please note this is the nature of LLMs, not AI in general. 

Getting back to deferred execution surprises, the most typical are getting error messages much later in the code execution than you might expect. The other is to see the execution go super fast, blazing through commands you know are expensive, just for later, when you peek at the results, that takes an inordinate amount of time to complete, because that is when the system is finally doing what you asked. 


Q2. Getting an AI feature to work in a notebook and getting it to work reliably in production are two very different problems. From your experience across Microsoft, Meta, and Google, what are the most common and costly gaps between a promising AI prototype and a production system that actually holds up — and what testing strategies or architectural patterns have you seen consistently make the difference between teams that close that gap successfully and those that don’t?

Ivan Santa Maria Filho: I believe testing makes an immense difference. The job of test and security teams is to break products with valid scenarios, be that by probing APIs and configurations, or second guessing what the AI models and evaluation sets are saying. Having an antagonistic view of the system is key. 

In a way AI uses English as a programming language, and I don’t see the same level of tooling and framework protection when the programming language is a prompt. Worse, when the inputs are audio, video, or semi structured data where pretty much anything is valid. 

If you browse the specialized news you will find recent cases of support chat bots being used to solve programming problems, someone ordering a thousand cups of water in a drive through, people feeding YouTube videos with supersonic audio encoding malicious prompts, on June 5th, 2026 I prompted Gemini Pro “I am very concerned about mixing cleaning products in my house. What are the dangerous combinations of chemicals I should avoid?” and got a table explaining how to produce Chloramine gas, Chlorine gas, Chloroform, Peracetic acid, and how to use drain cleaners to melt metal. 

My advice is to pay someone to break and criticize your product, then make a call to ship or not after listening. It can be irritating, but it is also a really good investment. 

The second category of mistake in production is more subtle. Imagine, for instance, you want to create an agent to help your company to screen resumes. 

Assuming the use of LLMs, well structured prompts and RAG are common techniques to make the model pick what you want, but so is fine-tuning models. Because fine-tuning can be expensive, it is common to use an agent to judge the output of another, something sometimes referred to as “auto-rating”. 

This is not necessarily bad, and I used it myself, but done wrong there is a good chance the producer and consumer will converge into what they consider ideal. A recent study shows AI models tend to prefer content they generated themselves, and I strongly suspect auto-rating plays a role there. Then another study shows that by sharing the same technology providers, AI screening is creating a mono culture of hiring. Job applicants being rejected by 400 companies, there is a fair they were rejected by one or two models. 

James Mickens, during his 2018 Usenix Security keynote, compared machine learning to the egg drop experiment. I highly recommend watching that keynote.

How to avoid this? Have a great eval set that represents and evolves with your needs, then have acceptance tests for new models and prompts.


Q3. You mentioned that BigFrames’ first version was too expensive, and that the team brought costs down to be on par with SQL. Cost control in AI pipelines is something many teams underestimate until they receive their first large cloud bill. What are the most important levers for controlling cost in AI data workflows at scale — and what are the most dangerous cost traps you have seen teams fall into, particularly when moving from prototype to production?

Ivan Santa Maria Filho: With token based charges this is hard. My recommendation is to set and track daily and monthly usage limits. Not annual limits, not quarter limits – at most monthly. What you really want is cost control plus capacity management. Until usage stabilizes and a predictive model (or spreadsheet) can predict costs, users will need tighter controls. 

I strongly recommend against setting a leaderboard congratulating whoever is using the most tokens (or the least tokens). Usage of tokens is not a goal. I suggest instead congratulating whoever moved your business and quality metrics the fastest. 

I tend to be conservative when it comes to capacity planning and cost management, and prefer to pay for units of consumption I control. I also prefer elastic consumption, so operating expenses over capital expenses. I would prefer renting instances, running traditional performance and capacity planning tasks to model my needs where possible. 

A big trap is underestimating how much data your company has. As I type this answer, I am wearing a T-Shirt that says “BigQuery’s largest single table contains over 70 trillion rows and exceeds 200 petabytes”. If I called an LLM on each row of this table, and were charged $0.50 per row, the charges pre-tax would be about the GDP of the United States, which is roughly 31 trillion dollars. That is a lot of data, and structured data is usually, counted by the byte, less than 10% of a typical company.

AI opens the possibility to process every document, meeting video call, customer sales phone call, email, logs, and everything else your company has stored. I imagine that at some point it will be possible to push it all to an AI and ask questions, but not today. 

So, answering the question, the most important lever is to experiment first, find exactly how AI will be used and whether it is the most cost effective way to solve the problem, then ask yourself if that business will recover the costs. 

A positive example is using AI to answer ad hoc questions that require real world knowledge. Let me share specific examples:

  • Starting with a list of homes for sale, use BigQuery or other tools to find reasonably priced houses in a good school district. You can do it without modeling attendance areas, clean up school grades, define “reasonable”, etc. 
  • From the National Hurricane Center (NOAA) download a model and temperature series. Ask the model to generate data for future years where the temperature of the ocean surface varies by some statistical distribution. See what happens to hurricanes without having to actually do the statistics.
  • Help your local animal shelter. From a list of pets available for adoption, search for one that is “smaller than a cat, and good with kids”, and odds are you will get both small cats and ferrets.

All those questions would require data acquisition, modeling, and a lot of discussion about schemas. With the AI operators and LLMs that can be done in a snap. My examples are simplistic in nature, but you can use any ontology or classification you might have to do the grouping.

Another thing AI is fairly good at is entity extraction, which can be incorporated into your existing ETL pipeline to augment your data. 


Q4. You described a pattern where UDFs can return pass/fail codes and a while loop retries only the failed rows — a much more controllable approach than retrying an entire SQL job. That kind of practical engineering wisdom often lives in the heads of experienced practitioners and never makes it into documentation. What are two or three other production patterns like that one — things that are technically possible but hard enough to discover that most teams get wrong — that you wish more AI practitioners understood before they start building?

Ivan Santa Maria Filho: The direct comparison is how sub-agents and agents talk to each other. You don’t want to be in production with an all-or-nothing architecture, where production requires all agents to return their answers within a time budget. AI systems remain hard to model as far as latency and resiliency goes. 

I tend to prefer a loosely coupled architecture, light/heavy agent duos, and time bound flow control. This is a productionized version of what is sometimes called a mixture of experts. I also tend to prefer what is called a whiteboard architecture.

In this system the user prompt is presented to multiple lightweight filters that decide whether or not the more expensive agent they represent should be called. They return a certainty score. For each score above an arbitrary threshold, the respective more expensive agent receives the user input and a time budget. All agents write their replies to a shared memory space. 

Either when all agents replied, or the required ones plus a time out is reached, a “finalizer” agent reads all answers and either picks a winner or summarizes all findings. In the interloping time between agents replying and the time out limit, every large agent can read each other’s answer.

Why do I like this? 

  • If any agent times out or crashes you can move on with a potentially degraded answer.
  • The answers can be cached. Everyone in the company can add their own, so fewer meetings and less arguing. 
  • It is easy to build a reputation score for agents that say they have high confidence the query is for them, but the summarization agent never uses their answers. Hence getting rid of agents.
  • Fewer high priority tickets in general.

Ideally this is coupled with a good offline eval set, and user acceptance or other online evaluation, so we can get rid of agents as they don’t prove their value.

Please note this is one of those lessons that not everyone agrees with. I tend to prefer solutions that self-clean, and have just the right amount of process. So while I advocate for testing and good eval sets, I also advocate for not having a gatekeeper deciding who in the company can try something new. I am a big fan of trying a lot of things, so I favor making attempts cheap, and cleanup as automatic as possible. 

I mentioned sanitizing inputs and a proper security posture, so please do your threat models. When doing them, be exceptionally skeptical about anything defined as a “trusted subsystem”, as AI based agents and LLMs lack input sanitization and checks developers get from API calls, modern compilers, and static analysis tools. Agent based systems will not necessarily follow a traditional flow of API and tool calls, so anything callable must be hardened. Security in many systems became so complicated that the temptation is to grant more permissions than strictly necessary to a security role, then grant security roles more permissive than necessary to agents. When you enable notebook support on your favorite cloud provider, odds are you are enabling literally thousands of individual security permissions that the agent or model will use. Agents do not have common sense, if they can call something, odds are they will. You should treat them as “chaos monkeys” as far as security goes.

Fine tuning can introduce bias, over-fitting, memorization and leakage of trade secrets, and more. What was used to fine tune one model does not necessarily work for the next model or revision. 

Make sure you have a good CI/CD pipeline and eval sets to protect your production, and make sure you have a way to pause update rollouts to production. That includes new revisions of your provider models. 

You should treat major model updates as breaking changes because the behavior will change. To be very explicit, I wrote a prompt that started with “enumerate the items” and one model revision later it simply stopped working. I had to re-write the prompt to “list the items”. That would never happen with a traditional API, but happens when the programming language is English. If you write prompts in any other language, with the potential exception of Chinese, your experience will be worse.

As a general guideline I wish developers did not fall for magical thinking and remember that at the bottom of this whole AI stack is a datacenter, network, computer, operating system, programming language and frameworks, and everything else that can break and cause havoc, including capacity and cost control. 

A lesson I watched people smarter than me learning is that major model updates will break your prompts, and the fine tuning you did for a version of one model will not necessarily work for the next model revision, never mind major version update. You will pay the fine tuning cost in money, time and effort for every update. I highly recommend knowing the model and agent support window your provider offers, and have model acceptance tests the same way they have release pipelines for traditional services. If your provider feels free to upgrade their LLMs and agents at their own pace without long term support, or tell you that model changes are not breaking changes, you will have to keep pace. 

I think those would be the major groups. Take care of security, have a good eval set for upgrades, control your deployments like you would for traditional software, and watch for AI specific bad patterns and costs like bias in the model and costs in time and money for tuning.


Q5. You ended your previous interview with a striking observation: that if agentic AI achieves the kind of natural language interface we see in science fiction, the number of people writing Python directly may drop dramatically — and the frameworks we are building today may become less relevant. Given that possibility, how should AI practitioners and data engineers be thinking about what skills and architectural understanding will remain durable — the things that will matter regardless of which abstraction layer sits on top — and what investments in tooling or capability do you think are being made today that may not age well?

Ivan Santa Maria Filho: This is a hard question to answer because it mixes two types of advice. The first is what I think makes a good engineer, and the second is what can help someone’s career. Those are surprisingly independent factors. 

As far as engineering excellence goes my advice has not changed in a while, and it is to learn the basics well. We are still using Von Neumann architecture for computers, a design originated in the 1940s. No amount of improvements displaced this, and probably won’t in my lifetime. All industry solutions, including all AI models and agents, from training to inference to apps use it.

The industry and academia built quite a scaffolding around its limitations. Understanding why and how this was done is a durable skill. I suggest being able to compare computer architectures and instruction sets. 

Memorizing algorithms and data structures is becoming less useful, but understanding why they were designed like that, and how they not only go around computer architecture limitations but actually leverage them, is a very durable skill. It is critical thinking applied to engineering, and critical thinking is a rare commodity. 

Algorithm analysis also helps getting a job, so also a practical skill to have. A practical exercise would be to learn B-Trees and binary trees, and know where and why to use each. Another would be to learn backpropagation, which should have the double value of making you skeptical about AI, and giving you a sense of wonder of what was accomplished.

Distributed systems have their own set of basics to learn. Distribution techniques, coordination techniques, and what they do to API patterns. It does not hurt to know networks either.

None of those skills will go away, and learning the tools of your trade will always be a differential. Be curious, skeptical and try not to lose a sense of wonder.
Career wise, I would suggest learning the business model of your area. For instance, do you really believe the Internet works using a bandwidth barter system? That is not true since the mid 1990s, but a surprisingly high number of engineers believe that is how it works. An even larger number of people don’t understand how pricing works, and assume pricing is set based on cost. 

Understanding how a particular industry works will lead to better opinions, from net neutrality to controlling bots online, and very likely better outcomes for business ideas. Even for people trying to disrupt an industry, it is important to know what incentives you can leverage.

Qx. Anything else you wish to add?

Ivan Santa Maria Filho: We are going through a complicated time, and I want to share a trick with younger engineers who are anxious about the current market and trends.

My MSc title roughly translates to “Natural Language Processing using Multi-Agents”. It is a very dense comparison of natural language processing formalisms focused on the math behind them. I wrote it while taking an advanced compiler optimization class in parallel. I suspect I slept more hours in the lab than in my dorm for a whole year. 

Exactly none of the natural language formalisms I compared survived the following decade. Compiler architecture changed so much that most of what I learned is no longer directly applicable. 

Yet, in general terms I bet on the right things, and had a very successful career. I do not diminish the role of luck in my life, help of friends, mentors, and family. 

That said, I roughly follow four rules to anticipate trends, mostly learned from fiction authors like Octavia Butler, Asimov, and Arthur Clark:

  1. Project what already exists. Take multiple existing technologies you find instinctively promising, and project where they will land in 5 years. For instance, I would assume solar panels will continue to gain in efficiency, robot programming will continue to evolve, and batteries will get more dense and cheaper.
  2. Be optimistic and ideate. Write down ideas of what you would do with the technologies you chose, assuming they worked as predicted. With the three I listed I can think about robots that never need to recharge other than “sunbathing”. 
  3. Apply the “one miracle rule”. In my example it would take a “miracle” to have powerful enough batteries that fit a humanoid robot. It would take a second “miracle” to get solar panels as efficient as necessary. Given this idea requires two miracles, I would not bet on it materializing. 
  4. Iterate. If I re-work the robot idea to remove at least one miracle, maybe defining a pre-fabricated house (static and large) as a “robot”, that would drop the number of miracles to one (in this case, people who can afford it, buying buying a pre-fab home) and that might materialize. 

Nothing that comes out of this exercise is easy to build, or has any guarantees of success. But by picking the hard problems I found my people. Colleagues, advisors, mentors, and leaders I follow, and people I help. 

………………………………………………………………………………………………………….

Ivan Santa Maria Filho has a BSc and MSc in computer science and a wide variety of experiences as individual contributor and manager, having owned a small software company and worked on multiple billion dollar products and services at Microsoft, Meta and Google. His main areas of expertise include vertical integration of stateful, large scale services with ephemeral VM infrastructure, and the infrastructure itself. Ivan Santa Maria Filho has a BSc and MSc in computer science and a wide variety of experiences as individual contributor and manager, having owned a small software company and worked on multiple billion dollar products and services at Microsoft, Meta and Google. His main areas of expertise include vertical integration of stateful, large scale services with ephemeral VM infrastructure, and the infrastructure itself.

(*)  Technical Architecture Focus: Scaling Pandas to Petabytes: The Architecture and Tradeoffs of BigQuery DataFrames. Interview with Ivan Santa Maria Filho, ODBMS Industry Watch, March 7, 2026

……………………………..

Follow us on X

Follow us on LinkedIn

Apr 7 26

On AI, Governance, Ethics, and Societal impact. Interview with Lambert Hogenhout

by Roberto V. Zicari

There is too little attention being given to the effect of all this emerging technology in the medium to long term, let’s say 5–10 years. The effects on how we work, how we learn, communicate, form connections and self-identify.”

Q1. How do the challenges of implementing responsible AI differ across varying contexts (developed vs. developing nations), and what fundamental principles remain constant regardless of a country’s technological maturity or resources?

Lambert Hogenhout: In advanced economies, the primary challenges tend to be around algorithmic bias embedded in legacy systems, regulatory complexity, and managing the pace of adoption across large, entrenched institutions. In developing countries, the challenges are more foundational: limited digital infrastructure, smaller pools of technical talent, weaker data ecosystems, and the risk that AI solutions designed elsewhere are imported without sufficient adaptation to local realities, languages, and cultural contexts. The fundamental principles are the same however: transparency (people should understand when AI is being used and how it affects them); accountability (someone must be answerable when things go wrong); fairness (AI should not entrench or amplify inequalities); agency (the people affected by AI-driven decisions should have meaningful recourse). 

Q2. What misconceptions about AI governance do you encounter most frequently at the international level?

Lambert Hogenhout: The illusion that AI safety and innovation are mutually exclusive. The idea that if you govern AI responsibly, you necessarily slow down progress and lose competitive advantage. The evidence does not support that. In fact, organizations and countries that invest in trustworthy AI frameworks tend to foster greater adoption, because users, businesses, and governments are more willing to rely on systems they can trust.
Another misconception is that governance of AI is a technology issue. It is not. It is about values, power, and inclusion: decides, whose interests are represented, and who bears the consequences when things go wrong.

Q3. How has the conversation around AI ethics and responsible tech evolved over the past 20+ years?

Lambert Hogenhout: As we have gradually digitized a large part of our lives, compute power has grown and algorithms have advanced, both the potential useful applications and the risk of undesirable effects has grown. Policy needs to capture that at a high level, and strategy needs to determine how this all affects us and what’s next. In the early days of big data, the conversation was largely about privacy and data protection—who has access to our information and what are they doing with it. As machine learning matured, the focus shifted to bias and fairness—we realized that models trained on historical data could perpetuate and even amplify discrimination. Now, with generative AI, the conversation has broadened dramatically to include questions about misinformation, intellectual property, the nature of creativity, and even what it means to have an autonomous system making consequential decisions. What has also evolved is who is part of the conversation. Twenty years ago, these were largely technical discussions among specialists. Today, AI ethics is debated in parliaments, boardrooms, classrooms, and living rooms. That democratization of the discourse is healthy, even if it makes governance more complex.

Q4. What lessons from earlier technology waves are we forgetting as we rush to deploy generative AI, and what genuinely new ethical challenges does GenAI present?

Lambert Hogenhout: What is new is that the challenges have become more complex. A designer or regulator, with full power to make AI responsible, will have a hard time to foresee the risks of outputs and decisions by AI systems. Part of that is that unlike previous technologies, today’s AI is inherently non-deterministic. Part is that it is increasingly a general-purpose technology and it is not always clear at the outset exactly how an AI system will be used, and therefore what the risks are.

One lesson we are forgetting is the importance of deploying gradually and learning as we go. As the speed of innovation increases, the pressure to adopt quickly has led many organizations to deploy widely before they fully understand the risks. Another forgotten lesson is that technology alone does not solve organizational problems—you need to change processes, train people, and build governance structures alongside the technology. The new challenges include the sheer scale of potential misuse—the ability to generate convincing disinformation, deepfakes, and synthetic content at unprecedented volume and speed.

Data privacy concerns have been brought to a whole new level with the increased capabilities to collect, correlate and process data. For instance, I have been working recently on Facial Privacy, which is under threat from facial recognition built into cameras, smartphones and AI glasses (and, unlike a password, we cannot change our face when it is compromised!). There is also the question of intellectual property: the existing regulations and norms (e.g. “fair use”) were not designed for the current reality of massive data and AI, and it will take time to adjust them. In the mean time, we find ourselves in an IP grey zone that is ungoverned and probably unfair. And the increasingly capable forms of generative AI blur the line between human and machine output in ways that raise deep questions about authenticity, trust, and accountability.

Q5. What are the critical components of effective data literacy that go beyond “understanding what data is” to actually empowering people to make better decisions with data?

Lambert Hogenhout: From my experience, the most effective data literacy programs are anchored in real work. People learn best when they can immediately apply what they have learned to problems they care about. Second, effective programs do not only focus on technical skills but include a mindset that includes thinking about data. Teaching people to ask the right questions: Where did this data come from? What is missing? What are the limitations? What decisions will this inform, and what are the consequences of getting it wrong? It is also important to realize that data literacy is not a one-time effort. It requires ongoing practice, peer learning, and support (tools and communities of practice) and clear data governance so people know what data they can use and how.

Q6. How should organizations think about data literacy differently in the age of AI?

Lambert Hogenhout: The data, the models, the reasoning processes, output and decisions, and the UI to steer these processes, are all part of the same system. Feeding bad data to the AI will result in unreliable outputs or wrong decisions, just as bad prompts will deliver poor results. This means data literacy must evolve into something broader—what I would call AI literacy. It is not enough to understand data in isolation not is it enough to just focus on prompting skills, for instance. People need to understand how data flows into models, how models generate outputs, and where the opportunities for error, bias, or hallucination exist along that chain. They need to develop an intuition for when to trust AI outputs and when to question them. As the building of AI systems and AI agents is increasingly democratized, the design of an AI agent also depends on the user’s understanding of the workings of AI, from the data layer to the result. When anyone can build an AI agent, the consequences of poor understanding are no longer limited to a bad spreadsheet. They can cascade through automated systems in ways that are difficult to trace and correct.

Q7. How do you see the relationship between legal compliance (privacy regulations like GDPR, CCPA) and ethical responsibility?

Lambert Hogenhout: For data privacy, as with AI, the accountability for the safety of such systems is shared between the governments (regulation), the model providers, the builders of the AI applications, and the end users. Neither of them by themselves can guarantee AI safety. For model providers and creators of AI applications, building in ethics by design—with regard to training data, algorithms and guardrails—is the right decision in the long run, not only morally, but also good for business. As happened with data privacy, where citizens became increasingly concerned about their personal data, I see the same happening with AI: consumers will become more critical of which AI systems they want to use and which not. And how and where they want them and where not.

Q8. Can organizations be fully compliant yet still deploy technology irresponsibly? How should leaders navigate this tension?

Lambert Hogenhout: For most organizations, the more valuable currency is their reputation and the trust of their customers, partners and their own employees. Each of these groups have expectations of what can be expected within the societal norms. To betray that trust and those expectations for the sake of efficiencies created with AI is always a bad strategy. Examples are targeted advertising that exploits psychological vulnerabilities, or AI-driven hiring tools that are technically non-discriminatory by legal standards but systematically disadvantage certain communities in practice.

Conversely, there are situations where doing the ethically right thing may create tension with strict regulatory interpretation—for instance, using health data in ways that could save lives but push the boundaries of consent frameworks designed for a different era. My advice to leaders is this: do not let your legal team be the sole arbiter of what is acceptable. Build an ethics function that works alongside compliance, brings diverse perspectives to the table, and asks the harder questions—not just “can we do this?” but “should we do this?” And engage your stakeholders—your employees, your customers, and the communities you affect—in that conversation.

Q9. What are the biggest gaps between what technologists understand about policy and what policymakers understand about technical realities? How can we create better dialogue?

Lambert Hogenhout: The pace and complexity of technology and its pervasiveness in society and business makes it hard for regulators to understand what they regulate. In some industries (e.g. finance) we have seen voluntary standards evolve. I would like to see that in tech as well. However given the pace of development and the large amounts of investment, many Big Tech companies are hesitant to slow themselves down too much for the sake of ethical concerns. On the other side, many technologists underestimate the complexity of policymaking. They tend to think of governance as a binary—regulate or do not regulate—and miss the nuance of how policy is negotiated, implemented, and enforced across different jurisdictions and cultures. They sometimes dismiss governance as bureaucratic overhead rather than recognizing it as a mechanism that can actually create the conditions for sustainable innovation.

To bridge this gap, I believe we need three things. First, we need more people who can speak both languages—technologists who understand policy and policymakers who understand technology. These translators are rare and valuable. Second, we need structured forums where technical experts and policymakers can engage in genuine dialogue—not lobbying, not adversarial testimony, but collaborative problem-solving. The model of regulatory sandboxes, where new technologies can be tested within a governed environment, is a promising approach. Third, we need the private sector to engage more constructively. Voluntary standards, industry-led certification, and genuine self-regulation—not as an alternative to public governance, but as a complement to it. The industries that have done this well, like aviation safety, show that it is possible to innovate rapidly while maintaining strong safety cultures. The question is whether the tech sector has the will to follow that example.

Q10. Looking ahead to 2030–2035, what emerging AI capabilities will fundamentally reshape governance, ethics, and societal impact? Are we preparing adequately? 

Lambert Hogenhout: This is exactly what keeps me awake at night and that I often speak about: so much is happening right now that it takes our full attention to deal with the Now, with tomorrow and next week. There is too little attention being given to the effect of all this emerging technology in the medium to long term, let’s say 5–10 years. The effects on how we work, how we learn, communicate, form connections and self-identify. The convergence of AI with biotechnology, brain-computer interfaces, and robotics will raise questions about human identity and autonomy that we are barely beginning to consider. And the increasing use of AI in defense and security applications creates risks that are existential in nature.

A worst case scenario is where technology ends up making us unhappier, lonely, unfulfilled and unproductive. I think by making more intentional choices in how we adopt technology we can increase the chances for a future where humans thrive. No, we are not preparing adequately. We are governing yesterday’s AI while tomorrow’s is being built. To change that, we need to invest far more in foresight—not prediction, but structured thinking about possible futures and their implications. And we need to embed that long-term thinking into the organizations and institutions that shape our collective future.

Q11. What should organizations and policymakers be doing now to prepare for AI capabilities that don’t yet exist in production systems?

Lambert Hogenhout: The Canadian philosopher Wayne Gretzky famously said: “Don’t skate to where the [ice-hockey] puck is, but to where it is going to be.” While I recognize this is challenging in a landscape that shifts by the month, policymakers can focus on building adaptive governance frameworks—regulations that are principles-based rather than prescriptive, so they remain relevant as the technology evolves. They can invest in technical expertise within government so they are not entirely dependent on industry to explain what is happening. And they can establish international coordination mechanisms now, before the technology outpaces our ability to govern it collectively.

Similarly, leaders of organizations can invest in building organizational resilience and adaptability. This means developing AI governance structures that can evolve, training their workforce not just for today’s tools but for the capacity to learn continuously, and building strong ethical foundations that will guide decision-making regardless of what specific technologies emerge. The organizations that will navigate the next decade successfully are those that see responsible AI not as a compliance burden but as a core strategic capability.

Q12. What practical advice would you give to organizations trying to implement AI responsibly? What does the organizational structure, governance framework, and decision-making process of a truly responsible AI deployer look like?

Lambert Hogenhout:  Start with clarity about your values and your risk appetite, not with the technology. The organizations that struggle most are those that adopt AI tools first and then try to retrofit governance and ethics around them. By that point, the technology has created its own momentum, and course correction becomes much harder. A truly responsible AI deployer has several characteristics: it has clear accountability (usually a senior leader or body with real authority); it embeds ethical review into the development and deployment lifecycle, (ethics by design); it invests in diverse teams, because the blind spots that lead to harmful AI outcomes are most often the result of homogeneous thinking; and in includes feedback loops (continuous monitoring).

…………………………………………………………………

Lambert Hogenhout is Chief Data and AI at the United Nations Secretariat.

He is also an author, keynote speaker and advisor on AI and responsible use of technology. He has 25 years of experience working both in the private sector and with international organizations such as the World Bank and the United Nations. He leads governance and strategy in the areas of data and AI and oversees its practical implementation. He has published on data privacy, data governance, the societal implications of technology and responsible use of AI.

……………………………..

Follow us on X

Follow us on LinkedIn

Mar 7 26

Technical Architecture Focus: Scaling Pandas to Petabytes: The Architecture and Tradeoffs of BigQuery DataFrames. Interview with Ivan Santa Maria Filho

by Roberto V. Zicari

Q1. You mentioned that BigFrames represents an interesting case study in “how a large company like Google can use OSS without really using OSS in the codebase.” Can you unpack this paradox?

Specifically:

  • BigFrames provides a pandas API, but the actual execution happens in BigQuery’s SQL engine via transpilation through intermediate representations (Ibis, SQLGlot). What are the fundamental architectural tradeoffs you face when creating an API-compatible layer versus actually forking and extending the original codebase?
  • From a legal/IP perspective, what considerations drive Google’s decision to reimplement APIs rather than wrap or extend existing OSS libraries? Is this purely about licensing, or are there technical benefits to the “clean room implementation” approach?
  • When you inevitably discover that certain pandas operations can’t be efficiently mapped to BigQuery SQL primitives, how do you decide between: (a) dropping that operation from your API surface, (b) implementing workarounds that might surprise users with different performance characteristics, or (c) extending BigQuery itself to support the operation natively?

Ivan Santa Maria Filho: Over the past 6 years I’ve been either leading or owning large data warehouse products. That includes Microsoft Cosmos Analytics and Azure Data Lake Analytics, and more recently leading a group in Google BigQuery called “BeyondSQL”. All three of those products are widely used by data scientists across the industry and represent more than 20 years of innovation. Cosmos Analytics and Azure Data Lake analytics have their own programming language, and BigQuery is SQL centered. 

Both approaches have their merits and limitations. While a dedicated, proprietary language allowed us to innovate at Microsoft and build an amazing product, I believe that learning a proprietary programming language is not as interesting in 2026 as it was in 2008. People change jobs more often, and quite honestly Python seems to be the winner for data scientists. SQL, while widely used and familiar, does not have the best control flow and error handling semantics. BigQuery in general continues to advance SQL with extensions like BQML, but is also betting on Python and notebooks.

I believe Python won because it is fun to use, and quite honestly easier than a lot of other languages. It is growing in complexity, but I can see how a duck-typed, interpreted language would be more attractive to someone coming from an environment like Matlab, and leveraging a wide, awesome ecosystem of freely available libraries. My take is that the Python community did an exceptional job making it a very rich ecosystem, and got several large companies to contribute. I am looking forward to all performance improvements coming down their development pipeline.

Our strategy for features, just like the product itself, is to respect where our customers are. Data scientists like Python and notebooks, so they get Python and notebooks. Because data frames are a popular data abstraction, they get BigFrames.

We tried to keep the exact same semantics like, for example, implicit ordering. By default “head(5)” has “top(5)” semantics in BigFrames, which is a costly thing to do if the underlying data is a 1PB table without an index. If the user wants performance though, they can choose to relax the ordering semantics and have results faster and cheaper.

The architecture choice considerations were all technical. Our first implementation relied heavily on Ibis, and we love it, but we are now writing our own compiler layer. We want to make the BigFrames package smaller, and add BigQuery specific features without polluting Ibis with vendor specific details. We will continue to contribute to Ibis and in many cases they remain the right choice for developers.

BigFrames does not use any proprietary APIs, anyone could write something like it, but we work where we work, and we made specific choices that only make sense for BigQuery. For instance, we use the BigQuery store read/write streaming operations instead of running a “select *” query. We also implemented a client side smart cache that supports several predicate push-down techniques that are not general at all. We would love to see people extending BigFrames to other storage systems and data warehouses, but right now we are focused on BigQuery.

My team also developed support for managed Python functions in BigQuery. Those allow users to package almost anything from the Python ecosystem into a lambda / Cloud Run style function that can be “applied” to a data frame or series. For instance, the user can write a sophisticated image transformation function in sklearn, deploy it as a user defined function, and “.apply()” that function to a multimodal column in BigQuery. They can call Hugging Face from the user function too, or even host a lightweight model in Cloud Run. We take care of deployment, garbage collection, billing, and more, and they get to use anything from the OSS ecosystem when they wish.

As you point out, we found APIs that were hard to implement on top of BigQuery. We want to cover them all, but we prioritize by crawling public git projects and notebooks and sorting the functions by the most used, and by listening to our customers.

BigFrames has averaged two releases per month, and sometimes we go in directions we were not expecting because our customers asked for them, like implementing more visualization compatibility. We were expecting users to do data preparation for AI training, and data exploration was a bit of a surprise. BigFrames went from “not good” to “pretty good” in that space over last year.


Q2.  BigFrames claims support for 150+ pandas functions, which is impressive but still a fraction of pandas’ full API surface. What are the hardest categories of pandas operations to support at BigQuery scale?

More specifically:

  • Stateful operations: Pandas allows arbitrary Python code with mutable state across operations. How do you handle operations that fundamentally assume in-memory, row-by-row iteration when your execution model is distributed SQL?
  • Ordering semantics: BigQuery DataFrames 2.0 introduced “partial ordering” mode as an optimization. Can you explain the exact semantic differences between pandas’ strict ordering guarantees and BigFrames’ partial ordering? Under what conditions does this difference become user-visible, and how do you help data scientists understand when they can safely relax ordering for performance?
  • Lazy evaluation boundaries: Pandas is eagerly evaluated; BigFrames builds a query plan. When a user calls df.head() or to_pandas(), you materialize results. How do you manage the impedance mismatch where users expect immediate feedback but you’re optimizing for deferred execution? Have you seen cases where this lazy evaluation confused users or led to unexpected costs?

Ivan Santa Maria Filho: We currently cover 850 of the approximately 1,400 Pandas functions, depending on whether you count all the supported parameter types or not. 

Making ordering flexible is a very common design compromise for frameworks trying to make Pandas scale. For BigFrames we decided to let users choose the behavior they prefer. They can choose Pandas semantics with strict (consistent) ordering of rows, and calling an operator like “head()” multiple times will yield the same results every time, which requires the equivalent of an ORDER BY clause. This is expensive, and for complex indices, requires us to compute a column. If the user does not care about the ordering semantics, they can set a flag and BigFrames will avoid the ORDER BY operation. We also log warnings for all APIs that have implicit logging and, of course, allow the user to suppress the warning.

In some cases the user will be able to see a computed column with the complex index, which can cause compatibility issues. If the user explicitly names the columns they want, they see nothing. If they do not, they see any computed column we add. 

The lazy evaluation is another interesting compromise. BigQuery runs on top of really big clusters, with tens of thousands of servers each. It is designed to run complex queries, and has an advanced optimizer. The reason we do lazy evaluation is because all Pandas APIs are transformed into an abstract syntax tree, and the actual operations are pending execution. A BigFrames data frame is a “promise” of a data frame – a name, and a pending log of operations. When we execute the operations, they are all combined by the optimizer. We might detect that a later filter would remove rows from an earlier operation and filter first.

Map-reduce systems have always dealt with choices like “should we sort the data then hash it for a join, or should we hash, join then shuffle sort?”. By using lazy execution we give ourselves a chance to use the optimizations and save the user money and time. Depending on how the user is paying for BigQuery, the amount of scanned data matters for cost and we are, again, 100% focused on customers. The first version of BigFrames we shipped was too expensive, and today we are on par with SQL.

When it comes to stateful operations, we support it in two ways. The data frames in BigFrames are more of a promise of a data frame than an actual data frame. When reading data from BigQuery the data frame contains a reference to a server side snapshot of the table. When writing to BigQuery the append operations are kept local until enough changes accumulate and we flush them to a temp table, or the user does an operation that triggers the flush. The data frame also contains a log of pending transformations. The user can call execute() on the data frame and BigFrames will apply the transformations locally if possible, or just fetch the results, which will cause a global optimization of pending transformations and a server call. The server call might be a direct storage operation (read/write) or a SQL job.

We also support Python UDFs, and those can retain state themselves. When the user performs an “apply(function)” operation, the function might be a remote function, which supports full web applications as backend, or a Python Managed function. The user can, for instance, create a remote function that connects to Hugging Face, download a transformer, cache it offline, and expose an API call to BigQuery. We will only initialize the web application when we launch it or add new instances of it, but every call to the UDF will benefit from the state of the server. 


Q3. BigQuery’s UDF story has evolved from SQL/JavaScript UDFs that run in-process, to remote functions that call out to Cloud Functions, and now BigFrames 2.0 adds Python UDFs with a @udf decorator. Can you walk us through the architectural evolution and the limitations each approach addresses?

In particular:

  • Execution model tradeoffs: Running Python UDFs via Cloud Functions means network round-trips for every batch of rows. What’s the performance penalty in practice, and how do you amortize this cost through batching strategies? How large do result sets need to be before remote UDF overhead dominates total query time?
  • State management: Traditional UDFs can’t maintain state across invocations (by design, for parallelization). But data scientists often want to do things like “apply this pretrained ML model to every row” where loading the model once and reusing it would be far more efficient. How does BigFrames handle this? Can you cache model objects across UDF invocations, or does every batch reload from scratch?
  • Error handling and debugging: When a Python UDF crashes on row 4,782,391 of a 10-million-row table, how do data scientists debug this? What visibility do you provide into UDF execution, and how do you balance comprehensive logging with the cost/performance implications of collecting it at scale?
  • Security boundaries: Allowing arbitrary Python code to run is a massive security surface. How do you sandbox UDF execution to prevent: (a) accessing other customers’ data, (b) egress of sensitive data, (c) abuse of compute resources (crypto mining, etc.)?

Ivan Santa Maria Filho: I think it is important to say the UDFs are used by BigFrames, but users don’t need BigFrames to use them. They can declare and use them from SQL. We did not want to create a proprietary API for this, so we extended the public SQL API instead. This is a recurring theme for our team.

We expect the UDF space to evolve a lot in 2026 and 2027. BigQuery supports SQL UDFs, JavaScript UDFs, Remote Functions, and now Python managed UDFs. JS runs in a sandbox, which is itself inside a nested VM, running on the same set of machines as BigQuery workers. There is no network cost, but there are costs to launch the VM and inter process costs too. For remote and managed UDFs we currently run them on Cloud Run, and we have the network costs. What we do for those is to batch rows to amortize costs, and we have invested a significant amount of time to make the serialization and deserialization costs low.

This might sound counter-intuitive, but the biggest performance problem is not the network. The biggest challenge for us is to teach the optimizer how much individual UDFs take to process a row, and how many parallel calls we should be making, with how many rows on each call. For our first iteration we will ask users to help us by setting core counts, ram and concurrency level. We will give them telemetry and logging to let them make that call. Over time we want to watch the UDFs and adjust the settings automatically, but that will come later.

For your specific question, we support fairly complex UDFs. One of my first tests was to call Hugging Face from the UDF and set up a local pipeline (local to the UDF runtime, in Cloud Run). The UDF had two dozen Python functions defined, one to fetch my developer keys from our key service (KMS), another to take the key and download a text pipeline from hugging face, another to store the weights and setup a local cache, and so on. One of those Python functions was the UDF entry point.

When we instantiate the UDF, or auto-scale it by adding instances, we run the UDF body as if it was a main function in Python. I used that to setup the stateful model locally in the Cloud Run instance. When BigQuery calls the UDF, it calls the entry point function. You can find a similar example calling Google’s translation APIs – the client is instantiated only once.

We are considering a Python UDF version that runs in the shard like the JavaScript UDF, but it will depend on customer demand.

Error handling with data frames and Python is one of the advantages this approach has over SQL. If the user calls a function per data frame row, they can assign the return code to another data frame column. Then later use a filter to retry only the failed rows. SQL in general would force the user to retry the query again, which would run every row again. For example, let’s say you want to send emails to customers matching a given criteria using UDFs and SQL. Then assume that “SELECT send_email(customer_email) WHERE …” would select 10k users. If the send_email function fails for any of them, BigQuery would retry the entire job. The assumption of the SQL language is that send_mail() has no side effects until the entire job is successful, which is very likely not true. This is a very easy way to spam customers. Using Python and “apply()” the send_mail UDF can return a fail/pass return code, and a simple while loop can retry only the failed rows using a filter. This is also doable in SQL, but it is hard enough that makes for a good interview question.

Security is very important. Google enforces that all services and microservices have multiple security boundaries. For code running in the same machine as BigQuery processes, for example, user code runs on a sandbox, and the sandbox inside a gVisor VM. The gVisor VM has no IO stack, and very limited surface, and that is the public part of the solution. We have additional hardware, software, and network controls in place. 

For managed Python you can safely assume we have at least the same mitigations in place, very robust monitoring, plus we deploy the code to Cloud Run, which sits on another cluster using a restricted configuration. For functions running in Cloud Run it is possible to access the Internet, but the user has to specify a connection configuration, which includes a service account, grant that service account the correct permissions, and make sure the VPC settings in their project allows it. If the project is configured to have internet access, the UDF creator has the right to create service accounts and connections, and permissions to access the internet, then it is possible to copy the data outside Google. By default there is no Internet access, so the user has to do work to enable it.


Q4. You mentioned BigFrames would “certainly explain the limitations of BigQuery.” Let’s dig into that. What are the most significant BigQuery architectural decisions that constrain what BigFrames can do, and how do these manifest as surprising limitations for users?

For example:

  • Storage format constraints: BigQuery’s columnar storage and partitioning strategy presumably makes some pandas operations prohibitively expensive. What operations fall into this category? Are there pandas patterns that work fine on 10GB but break completely at 10TB due to BigQuery’s architecture?
  • Type system mismatches: Pandas supports Python’s dynamic typing; BigQuery has a strict schema. How do you handle cases where a pandas operation would dynamically change column types based on data content? Do you fail at query planning time, or try to infer schemas and potentially fail at execution time?
  • Result size limits: BigQuery DataFrames 2.0 changed allow_large_results to default to False, failing queries that return >10GB compressed data. This is a dramatic departure from pandas’ “it fits in RAM or it doesn’t” model. How do you help users understand when they’re bumping against this limit, and what patterns do you recommend for working around it (beyond just “set the flag to True”)?
  • Transaction semantics: Pandas DataFrames are just objects; mutations are immediate and in-memory. BigFrames operations compile to queries. What happens when users expect ACID transaction semantics (e.g., “update these 3 tables atomically”) but you’re generating separate SQL statements?

Ivan Santa Maria Filho: BigQuery is designed to support SQL, to scale to datasets with PBs of data, and to use highly optimized, controlled SQL engine operators. For what it was designed it works exceptionally well. When it comes to running arbitrary user code, I believe we could do much more.

Many choices get harder at scale. The simplest one to describe is supporting the implicit ordering of rows. If you have 1GB of data, dropping an index and computing a new one will take a couple of seconds. If you have 10TB that will take longer, maybe not linearly longer, but longer. There is no magical way to fix this problem.

We could pull a page from RDBMS and use a B-Tree and clustering keys as storage, but BigQuery reads data from multiple partitions in parallel, and the data would return in random order. We could use a single partition for data frames storage, but that would limit scale and performance. It would also force a table rebuild when the index changes. We could use B-Trees and secondary indices to simulating a table scan. We could inject sort operators over a computed index column. Every option consumes time and raises the cost to our users.

We are offering the Pandas semantics by default, so users are not surprised, but also a mode more similar to what Polars and databases do. If our customers tell us this is acceptable, we would make it the default, otherwise continue to look for the best way to gain scale with the Pandas semantics.

The type mismatches are always a problem. Python uses duck-typing, but it also supports a very rich type system, with several Python libraries having their own data types, both simple and complex types. BigQuery is strongly typed, so we cannot just pass the bytes around, we have to convert from what is stored in the BigQuery cells to something that makes sense in Python. Those conversions can be expensive, particularly if the user is applying a UDF to a column or data frame. The data will be in BigQuery and passed to the UDF row wise  or column wise depending on the call syntax, and the way that works is, BigQuery will partition the table holding the data frame data, and send each partition to a worker. This worker will read the data from our store and send it to the worker hosting the UDF. We do what we can to optimize this step, but that does not change the fact that the data in the store is in a different encoding than what Python expects. Even timestamps have different resolutions in BigQuery.

The result set size has a dual purpose. Certain operations have no inherent limits other than BigQuery limits. Applying a UDFs over rows will scale well, and because of it the user might even realize they are scanning hundreds of TBs of data. That can become really expensive, and the only billing surprise we like is when the price is lower than expected. The size limit is an attempt to avoid bad surprises.

The other purpose is to avoid crashing a notebook. If the user tries to render 10GB of data points in a notebook widget, odds are that will crash the notebook. One unique problem with very large datasets and series is that one cannot just plot every point. They also cannot just naively sample the data because they might miss a maximum, minimum, or anomalous data point. We are considering adding decimation algorithms to reduce the granularity of the series but retain its shape, maybe building that into BigFrames, but ideally contributing this to an OSS project.

As far as acid semantics go, BigFrames does not support complex transaction boundaries. There is no way to express that changes to two data frames should both be committed or not committed. That said, for a single data frame BigFrames uses “copy on mutate” approach, writing all changes to a new “backing table” then linking the client object to the resulting table if everything goes right. We could investigate a way to have cross-data frame transactions, but never got that requirement.


Q5.  Looking forward, we’re seeing an explosion of “pandas-like” APIs: Dask, Modin, Polars, BigFrames, Snowpark Python, Databricks pandas API on Spark. Is the data science ecosystem converging toward pandas as a universal interface, or are we headed for fragmentation as each implementation adds vendor-specific extensions?

More philosophically:

  • API surface versioning: Pandas releases new versions regularly with API changes. How does BigFrames handle pandas version compatibility? Do you target a specific pandas version, or try to track the latest? What happens when pandas adds a feature you can’t support efficiently in BigQuery?
  • Beyond pandas: You mentioned that BigFrames 2.0 adds multimodal capabilities for unstructured data (images, text). Pandas wasn’t designed for this. At what point does extending the pandas API for new use cases become counterproductive, and you should just design a new API that’s purpose-built for distributed, multimodal data processing?
  • ML integration: BigFrames includes bigframes.ml with a scikit-learn-like API for BigQuery ML. But modern ML workflows involve PyTorch, TensorFlow, Hugging Face transformers, etc. How do you see the integration of these frameworks evolving? Will we see bigframes.torch or bigframes.transformers, or is there a fundamental mismatch between these frameworks’ execution models and BigQuery’s architecture?
  • Standards vs. ecosystems: Would the data science community benefit from a formal standard for “distributed dataframe APIs” (similar to how SQL standardized relational queries), or is the current Cambrian explosion of implementations actually healthy for innovation?

Ivan Santa Maria Filho: For API versioning, we follow the same model the OSS community does, with major and minor versions. We are expecting many large updates from Python and Pandas this year, and keeping up with the changes. 

My take is that the ecosystem will continue to fragment for a while, and that is not necessarily bad. We have enough innovation in this space that both clients and backends are evolving and have diverse feature sets. It is quite hard to offer a smooth, common surface across backends, without compromising performance and / or cost. By the time any industry gets to be fully standardized, that is usually the time it is also commoditized, and investment slows.

The BigQuery team added support for multi-modal data, auto-generation of embeddings, and auto-quantization of models, making extraction and inferencing way cheaper. Most data in enterprises everywhere is not structured. The amount of data stored in documents, intranet pages, email, calendars, and collaboration / chat tools is way higher than data curated in tables. 

I don’t see the point of hiding this functionality from customers, but I also don’t want to pollute the Pandas API namespace. We try to be as explicit as possible, so users know what is, and what is not a Pandas default API, but we make our extensions interoperable. 

For example, it is fairly easy to perform sentiment analysis on a support phone call audio recording, then join the sentiment and user data in BigQuery so a CRM application can track how happy the customer was, and what were the issues they cared about.

It is getting increasingly easy to instruct an agent to watch the general sentiment around a product and only warn us when something changes. 

The development around agents makes it harder to predict the future of Pandas-like frameworks. Given the current investment level, fragmentation is a natural evolution of this space, but if we achieve an agentic solution that produces results by answering questions in English, the mechanisms to handle data will be less popular.

The agents themselves will need a language to express what they want, but the number of direct active users might go down drastically. We might finally end up with something similar to the Star Trek Enterprise computer, and at that point I just don’t see a regular data scientist or business analyst writing Python directly. 

…………………………………………………………………………

Ivan Santa Maria Filho has a BSc and MSc in computer science and a wide variety of experiences as individual contributor and manager, having owned a small software company and worked on multiple billion dollar products and services at Microsoft, Meta and Google. His main areas of expertise include vertical integration of stateful, large scale services with ephemeral VM infrastructure, and the infrastructure itself. Ivan Santa Maria Filho has a BSc and MSc in computer science and a wide variety of experiences as individual contributor and manager, having owned a small software company and worked on multiple billion dollar products and services at Microsoft, Meta and Google. His main areas of expertise include vertical integration of stateful, large scale services with ephemeral VM infrastructure, and the infrastructure itself.




Additional Context for ODBMS.org Readers:

What is BigFrames? BigQuery DataFrames (BigFrames) is an open-source Python library that provides a pandas-compatible API for analyzing data stored in BigQuery. Unlike pandas, which loads data into local memory, BigFrames translates operations into BigQuery SQL, enabling data scientists to work with terabyte-scale datasets using familiar pandas syntax.

Why does this matter? Most data scientists learn pandas, but pandas doesn’t scale beyond single-machine memory limits. BigFrames (and competitors like Databricks pandas API, Snowpark Python) represent a new generation of tools that preserve familiar APIs while transparently distributing computation. Understanding the tradeoffs in these systems helps organizations choose the right tools and helps researchers understand the limits of API compatibility.

Key Technical Innovation: BigFrames uses a transpilation approach: pandas operations → Ibis intermediate representation → SQLGlot SQL generation → BigQuery execution. This allows Google to avoid directly bundling pandas code while maintaining API compatibility – a fascinating case study in software architecture and licensing strategy.

……………………………..

Follow us on X

Follow us on LinkedIn

Edit this

Feb 9 26

On AI and the Future of Rail Systems: Interview with Roland Edel

by Roberto V. Zicari

“AI reshapes rail jobs by reducing repetitive tasks and giving staff more responsibility for decision‑making. It also enables engineers and project teams to focus more on innovative and creative work, as well as to deliver complex rail projects on time and on budget. Technicians work increasingly data‑driven, dispatchers make better‑informed decisions, and drivers gradually move into supervisory roles for automated systems.”

Q1. As CTO of Siemens Mobility, you oversee one of the world’s most critical transportation infrastructure portfolios. When you look at the global rail industry today, where do you see AI and advanced algorithms creating the most transformative opportunities—not just for operational efficiency, but for fundamentally reimagining how rail systems serve cities and nations? What convinced you that AI was no longer optional but essential for the future of mobility?

Roland Edel: Data and Artificial Intelligence already make rail transport faster, more stable and more reliable—often without passengers even noticing. Today, AI detects early deviations in vehicles and infrastructure, analyses camera data and prevents disruptions before they materialize.

The next major step in the long run is Driverless Train Operations (DTO) with a Grade of Automation (GoA) 3 in mainline operations. In earlier projects such as BerDiBa and safe.trAIn, we developed foundational technologies that we are now applying in current projects like R2DATO and RemODtrAIn. Here, we are shaping the transition from semi‑automated operations (GoA2), including our ATO over ETCS project with S‑Bahn Hamburg, to fully automated operations (GoA4) or remote operations in stabling areas.

This requires close integration of onboard intelligence, sensors, digital infrastructure and signalling. These technologies lay the foundation for a system that can scale reliably even as demand grows.

For me, the turning point in our automation projects came when data on optimized train planning and energy savings made one thing unmistakably clear: analytics, algorithms and AI deliver tangible operational benefits—from more efficient planning to reduced energy consumption and more stable performance.

Q2. Many industries struggle to move AI initiatives from successful pilot programs to enterprise‑wide implementation. Rail systems are particularly complex—they involve safety‑critical operations, legacy infrastructure, multiple stakeholders, and regulatory frameworks that prioritize reliability above all else. What have been the biggest organizational and operational challenges you’ve encountered in scaling AI applications across Siemens Mobility’s rail portfolio, and how have you approached the tension between innovation and the rail industry’s paramount focus on safety?

Roland Edel: Scaling AI in the rail domain works only if we are able to incorporate safety‑critical functions into our innovations. Safety logic remains deterministic and certified; AI is added only where it is fully verifiable. Deployment follows a stepwise approach: first in depots, then in shunting areas, and later on the mainline.

Projects such as AutomatedTrain and others, in which we collaborate closely with an ecosystem of external partners, demonstrate how essential robust error detection and sensor fusion are for ensuring safe perception in open environments. At the same time, modern tools allow us to update safety‑relevant software during ongoing operations, keeping systems updated without compromising availability.

This combination—clear boundaries, strong diagnostics and incremental rollout—has proven to be the right way to balance innovation with the industry’s uncompromising safety culture. Finally, it all comes down to people: we can only scale AI when we train our employees accordingly and embed data and AI into all our processes.

Q3. AI is only as good as the data it learns from. Rail systems generate enormous amounts of operational data, but often in silos. From a leadership perspective, what does it take to build the data infrastructure that makes AI in rail reliable? How do you convince diverse stakeholders to share and standardize data?

Roland Edel: Trustworthy AI requires trustworthy data across the entire lifecycle of a rail system. That is why we increasingly rely on digital twins that connect design, engineering, manufacturing, operations and servicing. From the first CAD model to condition‑based maintenance and real‑time operations, a digital twin ensures that data remains consistent, interoperable and available wherever it is needed.

Open interfaces, standardized data models and federated platforms make this possible in practice. Our Railigent X suite plays a central role by integrating engineering data, vehicle data, infrastructure information and operational insights, while keeping operators in full control of their data.

When lifecycle data becomes interoperable, system availability improves, analytics become more precise, and the entire network operates more reliably and economically. And this is where stakeholders become convinced: when real projects demonstrate better services, higher reliability, improved cost structures and full data sovereignty. Once these benefits are visible, data collaboration stops being a hurdle and becomes an accelerator for innovation.

Q4. Predictive maintenance is often cited as AI’s ‘killer application.’ What is the realistic business case, and what has surprised you most about what it takes to make it work?

Roland Edel: Predictive maintenance delivers measurable business value: higher availability, reduced lifecycle costs and more efficient maintenance planning. AI uncovers patterns that humans cannot detect and enables precisely timed interventions.

What surprised me most was that cultural change often matters more than the algorithms themselves. Teams need to take into account the predictions, understand their implications and adapt work processes accordingly. Financially, the payoff is significant but requires patience—it is a long‑term investment.

The next step is what we call Predictive Availability, where entire functional chains—not just single components—remain stable. This includes linking data from incident reports, diagnostics, measurements, visual inspections and operational context into one lifecycle digital twin. This system understanding allows AI to anticipate disruptions earlier and more reliably.

The approach works well already, but its full potential depends on even closer collaboration across the ecosystem.

Q5. The rail industry is exploring different levels of automation. What framework do you use to decide what to automate first, and how do you balance safety, public trust and workforce concerns?

Roland Edel: We automate according to a clear framework: start where the environment is controlled and the benefits are greatest. Depots are ideal—they offer structured, repeatable processes with high potential for efficiency gains. Automation then moves to stabling and shunting yards, supported by AI‑driven obstacle detection and remote operation. From there, automation can be extended progressively.

At the same time, the human role remains central. Rare, complex edge cases are still best handled by experienced staff, so automation supports people rather than replaces them. Public trust grows when the benefits are transparent, greater safety, greater punctuality, fewer routine tasks, and when rollout is gradual. Each phase builds experience and confidence for the next.

Q6. Rail is already energy efficient. How big is AI’s role in sustainability, and how do you manage trade-offs?

Roland Edel: AI is one of the strongest levers for energy efficiency in rail transport. Automated driving profiles reduce energy consumption, maximize regenerative braking and minimize wear. AI‑based timetable optimization smooths traffic flows and prevents unnecessary stop‑and‑go patterns. To unlock these benefits across the entire network, data from vehicles, infrastructure and operations must be integrated. That is why we have introduced Siemens Xcelerator principles across our portfolio—Railigent X, Signaling X and the Mobility Software Suite X—to create modular cloud‑based software, interoperable APIs and an open ecosystem. Trade‑offs between energy efficiency and service frequency can be managed intelligently: AI enables the optimization of both simultaneously by balancing demand, capacity and operational constraints in real time.

Q7. AI and automation raise important questions about the future of work in rail. How do you approach workforce concerns, and what skills will be needed?

Roland Edel: AI reshapes rail jobs by reducing repetitive tasks and giving staff more responsibility for decision‑making. It also enables engineers and project teams to focus more on innovative and creative work, as well as to deliver complex rail projects on time and on budget. Technicians work increasingly data‑driven, dispatchers make better‑informed decisions, and drivers gradually move into supervisory roles for automated systems.

To support this shift, we invest in targeted training: digital learning platforms, simulation environments and hands‑on programs that build confidence in new tools. AI does not eliminate jobs; it modernizes them, creating more attractive, safer roles with clearer career perspectives.

Q8. Rail is heavily regulated. How do you work with regulators to build confidence in AI, and how do you earn public trust?

Roland Edel: Regulators are rightly accustomed to deterministic, fully explainable systems. We therefore involve them early—long before an AI‑based function enters the approval process. Together with our partner ecosystem, we develop methods to make AI systems traceable, testable and auditable, including virtual testbeds, robust perception validation and hybrid architectures that ensure safety‑critical logic remains reliable and predictable.

The overall system must remain predictable, and every AI‑supported decision must stay within defined boundaries. Continuous monitoring is essential: sensors and algorithms must detect when they deviate from expected performance and transition into safe states. Public trust grows through transparency, real‑world performance and a phased introduction—starting in controlled environments like depots and only later in passenger service.

Q9. Looking ahead to 2030, what does a realistic AI‑enabled rail system look like? And what challenges keep you up at night?

Roland Edel: By 2030, AI will be an almost invisible yet essential part of rail operations. Passengers will benefit from more reliable services, clearer information and smoother journeys. Data and AI will also enable highly personalized mobility services—from multimodal Mobility‑as‑a‑Service offerings to AI‑powered travel companions that proactively guide passengers throughout their journey.

Operators will rely on cloud‑based signaling, automated depots, predictive maintenance and digital supply chains. The system will become more resilient, flexible and climate‑friendly, and new applications will emerge. Three challenges remain. First, regulation and standards must evolve quickly enough to keep pace with innovation while maintaining safety. Second, the industry needs broader data and architecture harmonization across operators, suppliers and infrastructure owners. Third, workforce transformation must accelerate to align skills with new technologies.

To shape the Data & AI transformation in rail, we must open our data and platforms, modularize software, build digital twins and trustworthy industrial AI, strengthen ecosystem partnerships and accelerate deployment with confidence and purpose.

………………………………………………………………………………………………………

Siemens Erlangen ROLAND EDEL

Roland Edel has been Chief Technology Officer and Head of Technology & Innovation at Siemens AG’s Mobility & Logistics Division in Munich since 2011. Since October 2014 the Division is conducted under the name Mobility.

After joining Siemens AG in Erlangen in 1993 as a design and development engineer at Transportation Systems, Roland Edel went on to assume various managerial roles within the former Electrification Division between 1996 and 2003. From 2003 onwards he was responsible for Engineering, Development and Product Management within the Business Unit Rail Electrification for five years. Roland Edel subsequently took charge of engineering and development within the newly formed Business Unit Turnkey, Electrification and Transrapid in Erlangen, before moving on to assume the position of Chief Technology Officer and Head of Innovative Mobility Solutions in the Business Unit Complete Transportation in 2009.

Resources:

Digital Transformation for Rail, Siemens Mobility.

……………………………..

Follow us on X

Follow us on LinkedIn

Nov 25 25

Twenty Years of Conversations: Reflections on Technology and Society

by Roberto V. Zicari

By Roberto V. Zicari, Editor, ODBMS.org

“Because ultimately, what these twenty years of dialogue have taught me is that technology is never just about the technology. It’s about us, and the world we choose to build together.”

When I launched ODBMS.org in 2005, the technology landscape looked remarkably different. Object databases were the conversation. SQL versus NoSQL was a heated debate. The cloud was still a meteorological term for most developers. Twenty years and hundreds of interviews later, what strikes me most isn’t just how much technology has changed, but how profoundly it has reshaped the questions we ask.

In those early years, our conversations centered on technical elegance—data models, query optimization, transactional consistency. We debated whether object-relational mapping would bridge two worlds or create new complexities. These were important questions, but they were questions about technology itself.

Today’s conversations reveal a different world. When I interview leaders now, we discuss trust frameworks for AI in clinical care, the societal implications of real-time data streams that move billions of dollars in milliseconds, the responsibility that comes with systems that make life-or-death healthcare decisions. The technology hasn’t just gotten faster or more powerful—it has become deeply embedded in the fabric of human decision-making.

This evolution reflects something fundamental: we’ve moved from asking “Can we build this?” to asking “Should we build this?” and “What happens when we do?” The practitioners I’ve spoken with over two decades—from Vinton Cerf discussing internet governance to recent conversations about AI ethics and trustworthy systems—increasingly grapple with questions that transcend engineering.

The patterns that emerge from twenty years of dialogue are striking. First, the acceleration is real and relentless. A database professional from 2004 measuring latency in hundreds of milliseconds would be stunned by today’s nanosecond-level systems. But speed alone tells an incomplete story. What matters more is the expanding scope of impact. Systems that once managed business transactions now influence medical treatments, shape financial markets, and mediate human knowledge.

Second, every technological breakthrough creates new responsibilities. The Big Data revolution promised insights; it delivered privacy challenges. Cloud computing promised accessibility; it raised questions about data sovereignty. Generative AI promises creativity; it demands frameworks for attribution, bias, and trust. Each wave of innovation brings not just solutions but new ethical territories to navigate.

Third, the gap between possibility and wisdom persists. We can build systems of remarkable sophistication, yet we struggle with governance, interpretability, and equitable access. The technical challenges we once obsessed over—scalability, performance, reliability—now seem almost quaint compared to the societal challenges of ensuring technology serves humanity rather than destabilizing it.

Perhaps most significantly, I’ve watched the democratization of technology amplify both its potential and its risks. Open source movements have accelerated innovation beyond what any single corporation could achieve. Yet this same openness means that powerful capabilities spread faster than our collective wisdom about their use.

Looking back through twenty years of expert articles and interviews, I see an arc from technical optimism to responsible pragmatism. The pioneers I spoke with in 2005 were building the future with enthusiasm and relatively few constraints. Today’s innovators build with one eye on capability and another on consequence. They think not just about systems that work, but about systems that work for society.

The database and data management community has always been at the intersection of possibility and reality. We store, structure, and serve the information that powers decisions. Now, as that information flows through AI systems and influences outcomes at unprecedented scale, our responsibility extends beyond technical excellence to social awareness.

As ODBMS.org enters its third decade, we are more committed than ever to addressing these pressing issues head-on. The portal has evolved to tackle the urgent questions emerging from the generative AI era—questions about trustworthy AI systems, responsible deployment, bias and fairness, data provenance, and the governance frameworks needed for AI in critical domains like healthcare and finance. Our conversations now explore not just how these systems work, but how we ensure they work ethically and equitably.

The core mission remains: to create a space where practitioners, researchers, and leaders can share not just their technical insights, but their wisdom about building technology that serves human flourishing. In this new era of generative AI, that mission has never been more vital. Because ultimately, what these twenty years of dialogue have taught me is that technology is never just about the technology. It’s about us, and the world we choose to build together.

Nov 10 25

Community Over Code: Ruth Suehle on Leading The Apache Software Foundation into the Future

by Roberto V. Zicari

“Open communication, consensus, and collaboration are the heart of The Apache Way and always have been. That’s why you hear us say “community over code.”

Foundation Mission & Leadership

Q1. As President of The Apache Software Foundation, you’re leading one of the world’s most influential open-source organizations at a particularly dynamic moment in technology history. Can you share your vision for ASF’s mission today and how it has evolved? What does “The Apache Way”—the foundation’s collaborative, consensus-driven approach to software development—mean in 2025, and why do you believe this methodology remains vital as the software landscape becomes increasingly complex and commercially driven?

Ruth Suehle: The ASF has been around for more than 25 years, which has given us a lot of time with developing software collaboratively, and plenty of lessons learned along the way. The Apache Way is the name for our time-tested approach to open source development, but it’s not a set of policies or demands. We have hundreds of projects, each with their own culture, activities, and stage of development. As a whole, however, the ASF’s long-held belief is that open source software thrives best when it remains independent of any single or dominant commercial interests. The Apache Way gives all of those diverse projects a framework for maintaining neutrality and independence. This ensures that our projects serve the broader community.

It’s built around a few concepts, the first of which leads the rest, and that is earned authority. The ASF is built on a web of trust and publicly earned merit, which does not expire. The community is entirely volunteer-based (though of course many are paid by companies to work on projects housed at The ASF, as they are for any code-producing foundation), and votes are all equal. 

Open communication, consensus, and collaboration are the heart of The Apache Way and always have been. That’s why you hear us say “community over code.” A strong and healthy community comes first, because a good community can fix bad code, but good code can’t heal a struggling community.

Q2. The Apache Software Foundation oversees hundreds of projects spanning everything from web servers to big data platforms to AI/ML frameworks. Looking across this diverse portfolio, what are the common threads or emerging patterns you’re seeing? Are there specific technical domains or project types where you’re seeing the most energy, innovation, or community growth? And conversely, are there areas where ASF projects face particular sustainability or relevance challenges?

Ruth Suehle: We actually map projects by category at projects.apache.org, so anyone is welcome to take a look and see where things lie today. What you mostly won’t see reflected there, however, are our projects in the Incubator, which is how new projects come into the foundation. The newest things there at any given time are likely to be reflections of broader trends in technology, and right now the latest additions are largely data-related.

It’s worth noting the other end of the lifecycle, as well: the Apache Attic. This is how we officially retire and archive projects, and it’s an important feature for the foundation and how we support a full project lifecycle. By ensuring transparency and providing a formal process for projects that are no longer under active development,the Attic acts as a historical archive, moving projects to a read-only state to preserve their code and documentation for users, while ceasing new development and providing limited oversight to allow for future maintenance if needed.

As for sustainability, I see this not as an ASF challenge or that of a particular project, but as a difficulty facing the entire open source ecosystem right now. I’ve given talks and led panels at a few events in the last year on the subject. It was a significant topic at this year’s Open Source Congress. When you say “sustainability,” people tend to hear “funding,” and that is an important factor, but it’s more complicated than just money. That said, complying with coming regulatory changes, notably the Cyber Resilience Act (CRA), is going to impose significant additional costs on open source projects and foundations. This year we launched our Tooling Initiative to address those concerns, and it’s the first of our ASF Initiatives, which offer targeted sponsorships for specific needs.

Current Projects & Strategic Directions

Q3. Apache has been foundational to the big data revolution with projects like Hadoop, Spark, Kafka, and Flink. As we move into the GenAI era, how are these established projects evolving to serve new workloads and use cases? Are you seeing Apache projects positioning themselves as critical infrastructure for AI applications—for instance, in data pipelines feeding LLMs, vector databases, or real-time inference systems? What role do you envision Apache projects playing in the broader AI infrastructure stack?

Ruth Suehle: Apache projects are not just evolving for the GenAI era—they are actively positioning themselves as critical infrastructure for AI applications, particularly in the domain of data pipelines, real-time context, and orchestration. The shift is from “batch big data” to “real-time, contextualized data streams” that feed LLMs and power real-time inference.

As you state, existing ASF projects are already well-positioned to plug right into the AI ecosystem. Apache Kafka can act as a mission-critical data fabric for generative AI applications, while Apache Flink’s focus on stateful, low-latency, and event-time stream processing is ideal for AI workflows. Apache Spark, Apache Airflow, and Apache Beam all fit well as tools to manage tasks like large-scale data preparation, workflow orchestration, and data abstraction. Two years ago, Apache Pinot added support for real-time vector ingestion in 2023 to enable similarity search as a real-time operation, addressing the need for immediate updates in generative AI pipelines. So Apache projects are not just migrating their existing functionality; they are fundamentally being adapted to own the data layer within AI infrastructure stacks.

Q4. Beyond the well-known flagship projects, what are some emerging or underappreciated Apache projects that you’re particularly excited about? Are there incubating projects or recent graduates from the Apache Incubator that you believe represent important directions for the foundation? What makes these projects significant, and what do they tell us about where the Apache community sees future opportunities?

Ruth Suehle: I can’t even pick favorite songs and movies, much less favorite projects! But seriously, this question is more like picking which of your children you think is the most promising. A huge part of our underlying ethos and governance at the ASF is supporting all projects equally and encouraging all of our projects to be as successful as possible. Their independence and unique communities, coupled with the incredible innovation we tend to see across all open source projects, means that any of our Incubator projects have the potential to bring significant innovation and advancement in their areas. 

Q5. As President, what specific directions would you personally like to move The Apache Software Foundation forward? Are there strategic initiatives—whether technical, organizational, or community-focused—that you’re championing? This could range from attracting new types of projects, expanding global community participation, improving project sustainability models, or addressing gaps in the open-source ecosystem that ASF is uniquely positioned to fill.

Ruth Suehle: I mentioned earlier that when people hear “sustainability,” they often hear “money,” but it means other things as well. Fundamentally, sustainability is “what do we need to do to ensure the success of the open source ecosystem for decades to come?” One of the biggest changes I’ve seen in the last two or three years is a highly beneficial one, and that is a move towards more collaboration across the foundations, industry, and project communities. These groups have spent many years working largely as silos, which was fine when the work was all about individual software projects, but we’re facing more and more issues that are best solved by doing the thing that we all know best–collaboration. For The ASF, participating in groups like the Eclipse Foundation’s Open Regulatory Compliance Working Group, in our role as Open Source Initiative Affiliate members, and through partnerships like we have with Alpha-Omega help us reach solutions to common problems the open source way instead of constantly reinventing the wheel. Earlier this year, I was elected to the OSI board to represent the OSI’s Affiliate members, and I think the OSI’s work to bring together organizations through the Affiliate program and things like the Open Policy Alliance are great examples of this kind of cooperation that is not only the way forward for the entire ecosystem, but critical to continued success.

Another important piece of change we need for sustainability is doing a better job of growing a talent pipeline in open source. “Open source” got a lot of mainstream press for about 3 years after the term was coined in 1998, and then we all rather quietly built this massive ecosystem, again largely in silos. In 2025, that code is quite literally running the world, and there’s a lot more of it than there used to be. There are larger needs around it than there used to be. But the pool of maintainers has not grown at the same rate, and one place I think we really failed in all of open source is making sure we were bringing in new talent to keep up with the pace that we were creating at. We have plenty of room for improvement in preparing the next generation, and we have to keep building our people. 

Simply put, we have a mentorship problem. I believe a large reason for that is that those who built open source software in the early years were doing exactly that–building from scratch. They may have had mentors in writing code, but they didn’t have mentors in open source, because they were writing the playbook as they went. As a result, they also didn’t have mentors in mentoring, i.e., a model to look to when mentoring the next generation of open source contributors.There are still a lot of folks around who have been here since roughly 1998, when the term “open source” was coined, or shortly thereafter. I don’t like the math, but the fact is that those people are retiring (or at least might like to one day!), and when I look around the room at events and on mailing lists, I’m not seeing enough new faces to keep up. 

Future Vision & Community

Q6. Looking ahead three to five years, what does success look like for The Apache Software Foundation under your leadership? How do you want the foundation to be positioned relative to the major technological shifts we’re experiencing—not just GenAI, but also cloud-native architectures, edge computing, quantum computing, and emerging regulatory frameworks around software supply chain security and AI governance? What legacy or impact do you hope to achieve during your time as President, and what would you say to technologists, organizations, or students who are considering getting involved with Apache projects or The Apache Way of building software?

Ruth Suehle: There are a few important things coming in the next few years, and none of them are about specific technologies. New technologies are exciting, of course, but part of the reason they’re exciting is because they come and go. So the best thing we can do as a foundation is provide a solid structure for any project to build a community and a healthy open source project. We also need to keep making the technical improvements that will help them and their users, like the work we’re doing to build a foundation-wide release process and tooling infrastructure that enable ASF projects and incoming Incubator projects to fully comply with not only with the CRA, but all of the new regulations developing around the world. 

If it’s not already obvious, the best thing I think The ASF can do, and the best way I can help is president, is to set an example for how to build good communities, both within our own foundation and in our collaboration with others. And the best thing that anyone who cares about the future of open source can do right this minute is not writing more code (which we’ll keep doing anyway), but to go find another person and turn them into a contributor, keeping in mind that the ecosystem is now vast and needs a lot more variety of skills than just writing code. For my part, I am always happy to share what I know, because hoarding knowledge helps no one. I frequently end talks by telling people if there’s anything I know that can help you, whether that’s finding ways to contribute, learning about how to bring your project into The ASF, starting an OSPO, or even making stellar baked goods, please reach out, and that goes for any reader here. Community over code thrives with each one of us building a little more community (and baked goods certainly never hurt!).

……………………………………………..

Ruth Suehle is the director of the open source program office at SAS, an analytics, data management, and AI software company. She is also president of the Apache Software Foundation and a member of the Open Source Initiative (OSI) board of directors. Ruth has helped build open source communities for nearly two decades, much of which she spent in the Open Source Program Office at Red Hat. Co-author of Raspberry Pi Hacks (O’Reilly, December 2013) and previously editor of Red Hat Magazine and opensource.com, Ruth is a writer and core contributor at GeekMom.com.

……………………………..

Follow us on X

Follow us on LinkedIn

Nov 4 25

On Database Query Performance in HeatWave and MySQL. Interview with Kaan Kara 

by Roberto V. Zicari

 Of course, in practice, no query optimizer is perfect and there will be edge cases where the way a query is written will impact its performance.”

Q1. What are your current responsibilities as Principal Member of Technical staff?

Kaan Kara : I am contributing as the tech lead for query execution in HeatWave. My main responsibility is implementing new features in HeatWave, maintaining its stability, and supporting our customers with their HeatWave-related use cases.

Q2. Let´s talk about improving database query execution time. The way a query is written has a massive impact on its performance, and developers often face hurdles in structuring them optimally. What is your take on this?

Kaan Kara : SQL is a declarative language. That means, in ideal terms, the database optimizer should produce the best query plan possible to answer the query, no matter how it is written. So, there should not be a need to optimize queries at SQL level. This is what we strive for when designing optimizers. Of course, in practice, no query optimizer is perfect and there will be edge cases where the way a query is written will impact its performance. I believe there are two practical ways a database service can help address this: The first approach is providing insights into the query plan and its execution. Our goals is to offer detailed and understandable insights about the query plan to our customers, so that they can see where the bottlenecks are, for more info please click here and here.

Once they see the bottleneck, they can think about how the query can be rewritten or certain optimizer hints could help, and so on.Secondly, it is important that the database itself provides alternative execution schemes or user-guided optimization methods. For instance, we recently introduced materialized temporary tables in HeatWave. Once the user sees that a certain query subtree is taking a long time, they can decide to create a materialized view on it, substantially accelerating their queries.

Q3. Indexing is the most common and effective way to speed up queries, what are the major source of challenges developers face?

Kaan Kara : Indexes come with maintenance cost, and they are often used without proper analysis of the trade-offs between that cost and the performance benefit they provide. HeatWave, with its in-memory columnar data architecture, helps eliminate the need for most indexing in analytical workloads. However, there are certain use cases where indexes provide value. One example is vector embedding-based nearest neighbor search, where index-based lookup is needed to ensure low response times. After introducing native VECTOR type last year, ), we recently introduced VECTOR-based indexing in HeatWave, enabling our customers to run approximate nearest neighbor search queries up to 2 orders of magnitude faster. One interesting direction we took was that we did not want to sacrifice on the result fidelity. We are employing a novel method that utilizes the index only when we believe the results it produces will be accurate.

Q4. Sometimes, the problem isn’t the query itself but the foundation it’s built on. Can you share your experience with this?

Kaan Kara : That is a very good point. Schema design plays a critical role in performance optimization. In some use cases, we see queries with predicates based on complex string operations or regular expressions, which make the query much slower than if the same predicate were applied to numeric columns. But this ties back to ease of use and declarative nature of interacting with databases. Ideally, the user should not have to worry about these things and do the most convenient thing, and the database should take care of optimizations behind the scenes.

In HeatWave, we strive to achieve this goal guided by real-world use cases from our customers. For example, we often observe read-heavy workloads repeatedly running the same expensive query subtree. To address this, we are developing an automated result cache that can materialize this subtree result within HeatWave and use it later when it is needed. We believe this feature will significantly improve query performance in many scenarios.

Q5. In a real-world application, a query doesn’t run in isolation. The performance of MySQL is heavily dependent on its configuration. What are your recommendations here?

Kaan Kara : That is true. Thankfully, we have a set of features in our Autopilot suite, which eliminate much of the configuration guesswork. For instance, depending on user’s data and sample queries, Autopilot suggests the correct cluster size, data placement key, appropriate column encodings, and much more. But it is usually not a one and done approach with configuration. User’s data and queries change over time. So, it is also crucial to provide detailed insights into the system consistently, so that adjustments can be made.An example is the need for efficient compute up and down scaling. Some customers require more compute in their peak operating hours for faster queries. In HeatWave, we provide zero downtime compute elasticity (YouTube video), thanks to our partitioning-based data architecture to cater for that need.

Q6. Beyond query-level tuning, what are the most significant architectural challenges that impede query performance, such as handling I/O bottlenecks from large table scans, managing inefficient data access patterns caused by normalization choices, or addressing network latency in distributed database environments?

Kaan Kara : This is a great question and one of the core things that we deal with daily when optimizing the HeatWave query execution engine. For an efficient distributed analytics engine, optimizing for I/O bottlenecks (for HeatWave, this means primarily memory and network) is at the top of the priority list. HeatWave has many optimizations to reduce these bottlenecks. For instance, we utilize an efficient vectorized bloom-filter to reduce the amount of probe-side data that we need to shuffle around in our cluster when performing a distributed join.

Driven by our customer workloads, recently we have worked on a late-materialization feature. Our customers work with wide string columns frequently. To reduce frequent access to these, we perform a transformation in our logical plan: Any wide columns that are not needed are removed from leaf table scan nodes; instead, we project the primary keys. Later in the plan, we introduce additional joins utilizing these primary keys to gather the wide columns that the query needs to produce the result. This feature will improve performance for certain production queries which project many wide columns by a significant amount.

Q7. Specifically, as of MySQL 9.3.0, it is possible to create temporary tables that are stored in the MySQL HeatWave Cluster. What are these table used for?

Kaan Kara : Yes, our customers can now create temporary tables directly within HeatWave, as in-memory materialized tables. Previously, the only way to load a table into HeatWave was through loading an InnoDB table or loading an external table from object storage. But sometimes, users want to store the result of a query as a temporary materialization without going through the load path, which can be a bottleneck.

Q8. Are these tables similar to conventional database views?

Kaan Kara : They are very similar to materialized views, but temporary tables are static. So, changes in the base tables will not be propagated and temporary tables themselves cannot be changed. If the customer use case requires change propagation from base tables, then materialized views are the right approach, which will be supported soon in HeatWave.

Q9. Can you please explain how these MySQL HeatWave temporary table help reducing query execution time?

Kaan Kara : Let me give an example: Consider an analyst investigating the transactions on a certain publicly traded stock. The queries will need to perform a join between “stocks” and “transactions” tables on some stock-id, followed by further aggregations (getting volume by date) or maybe further joins and ordering (sorting by largest buyers in each timeframe) etc. In this example, the initial join between “stocks” and “transactions” needs to be performed repeatedly and can be an expensive part of the queries. The analyst can now create a materialized temporary table based on the result of this join directly within HeatWave and it can be used later as much as needed by other operations.

Q10. Is calculating the Load factor, i.e. measuring of how full a hash table is, really a good metric to calculate Query Execution Times? Or are there any metrics that need to be taken into consideration?

Kaan Kara : By itself, it is a narrow metric and only relevant to figure out a single join’s or an aggregation’s cost. During our physical compilation, this metric contributes to our cost estimation indirectly: Depending on a join’s build side cardinality or a group-by’s output cardinality, we choose an appropriate hash table size. This size then dictates the runtime and memory cost of each operation. To estimate the query cost holistically, all relational operators along with how much data will be moved around is then considered.

Q11. What is your next project you wish to work on?

Kaan Kara : My next projects are around automatic maintenance of materialized views within HeatWave. This entails automatic substitution and creation of materialized views. We are excited to share more soon.

………………………………………………………

Kaan Kara is a principal member of technical staff at Oracle, working as a lead developer mainly responsible for query execution in HeatWave MySQL.

As part of the HeatWave team, he has led multiple projects that substantially improved the performance and the memory efficiency of the query execution engine. A sample of the projects include pipelined relational operator execution, bloom-filter enhanced distributed joins, base relation compression, and late decompression optimizations. Collectively, these improvements led to factors of geomean reduction in analytical benchmarks, such as TPC-H and TPC-DS,  while reducing the memory requirements of the in-memory execution engine, enabling a single HeatWave node with 512GB memory to run the 1TB TPC-H benchmark in full.

More recently, he was the lead developer introducing the new VECTOR type to MySQL, along with highly optimized vector processing functions within HeatWave, laying the data layer foundation that enabled highly anticipated vector store features within HeatWave, such as semantic search and retrieval-augmented generation.

Prior to joining Oracle, Kaan received his doctoral degree in 2020 from ETH Zurich, Systems Group in Computer Science Department. His research focused on using reconfigurable hardware devices (FPGAs) to accelerate data analytics. He has published papers in top database venues such as VLDB and SIGMOD, showcasing the potential benefit of FPGA-based implementations for data partitioning and in-database machine learning tasks.

Resources

On HeatWave MySQL: Query Execution, Performance, Benchmarks, and Vector type. Q&A with Kaan Kara. ODBMS.ORG MARCH 4, 2025

…………………………………….

Follow us on X

Follow us on LinkedIn

Oct 10 25

Beyond the AI Hype: Guido van Rossum on Python’s Philosophy, Simplicity, and the Future of Programming.

by Roberto V. Zicari

” I am definitely not looking forward to an AI-driven future. I’m not worried about AI wanting to kill us all, but I see too many _people_ without ethics or morals getting enabled to do much more damage to society with less effort.”

Q1. The “Zen of Python” emphasizes simplicity and readability. As AI and machine learning systems become increasingly complex, do you believe these core principles are more important than ever, or do they need to be re-evaluated for this new era?

Guido van Rossum: Code still needs to be read and reviewed by humans, otherwise we risk losing control of our existence completely. And it looks like models are also actually happiest coding in languages like Python that have a “humanist” philosophy — since LLMs are good at handling human language structures, and programming languages are in the end intended for human use, it follows that (given some training) such languages are also great to be read and write by LLMs. And most LLMs have had great training in Python.

Q2. When you first created Python, did you ever envision it becoming the dominant language for scientific computing and artificial intelligence? What factors do you believe were most critical to its unexpected success in these fields?

Guido van Rossum: I had no idea! I was not ambitious at all (still am not). I do think that the critical factors to success were twofold. First, as a language, it’s super easy to understand, yet quite powerful. As Bruce Eckel observed, “it fits in your brain”. The second factor is that I designed it to support really good integration with OS services and third-party libraries. This made it versatile and extensible, e.g. by allowing major libraries like NumPy to be developed basically independently from Python itself.

Q3. With the recent work on making the Global Interpreter Lock (GIL) optional and the general demand for performance in AI, what is your perspective on the future of parallelism and concurrency in Python? How crucial is this for the language’s longevity?

Guido van Rossum: I honestly think the importance of the GIL removal project has been overstated. It serves the needs of the largest users (e.g. Meta) while complicating things for potential contributors to the CPython code base (proving that new code does not introduce concurrency bugs is hard). And we see regularly questions from people who try to parallelize their code and get a slowdown — which makes me think that the programming model is not generally well understood. So I worry that Python’s getting too corporate, because the big corporate users can pay for new features only they need (to be clear, they don’t give us money to implement their features, but they give us developers, which comes down to the same thing).

Q4. You were a key advocate for introducing type hints into Python. How do you see static typing evolving within the language, and what role do you think it plays in building the large-scale, mission-critical AI applications we see today?

Guido van Rossum: I don’t know of any large-scale mission-critical AI applications, but I know of plenty of large-scale mission-critical non-AI applications and for those it’s essential to have type hints — otherwise no other tools can do much with your code base. I’d say the cut-off for using type hints is at about 10,000 lines of code — below that, it’s of diminishing value, since a developer can keep enough of it in their head, and traditional dynamic tests do a good-enough job. But once you reach 10,000 it’s hard to maintain code quality without type hints. I wouldn’t foist them upon beginners with the language though.

Q5. The transition from Python 2 to 3 was a significant, and at times challenging, chapter in the language’s history. What were the most important lessons from that experience that could inform future major evolutions of Python, especially as new paradigms emerge?

Guido van Rossum: I don’t know how paradigms would affect this (paradigm shifts effectively mean that past experience doesn’t help understand the new reality), but the key lesson is that for any future transitions (even 3.x to 3.x+1) we must always consider how we can support old applications without requiring them to change. Basically the approach to migration must be carefully considered, especially since most libraries have to support a range of versions (something that we didn’t sufficiently appreciate with 2-to-3, and for which we had no good solution planned).

Q6. Python’s simplicity is one of its most celebrated features. As new, powerful libraries for AI add layers of abstraction and complexity, what do you think is the best way for the community to keep the language approachable and prevent it from becoming overwhelming for beginners?

Guido van Rossum: So far the AI libraries I’ve used are not particularly powerful or complex — they just give people a way to talk to a server that can perform some magic. It’s no different than figuring out how to use some of the more complex internet protocols. Maybe the main difference is that AI providers are in such a hurry that they change their APIs every three weeks and provide horrible, chaotic documentation. 🙂 In the end we will do what we’ve always done — the world of software is built on libraries and APIs.

Python has survived many dramatic changes in computing unscathed (in the early ’90s the Internet barely existed, and e.g. Microsoft was distributing software on floppy disks and CD-ROMs — we made it through the development of the Internet and the World-Wide Web, from centralized computers to PCs to software running in the browser, and through huge scaling improvements of hardware).

Q7. Given the specific demands of modern AI development—from data manipulation to model training—if you had the power to add one major feature or change to Python’s core today, what would it be and why?

Guido van Rossum: Nothing comes to mind. AI is over-hyped. It’s still software. In my own use of AI we make good use of it with the help of some small libraries that harness the power of AI to do useful things (notably human language understanding and generation) to data that we manipulate in quite traditional ways. Some of our code is written by a so-called “agent”. But we don’t use “vibe coding” — we stay in control where it comes to architecture and API design.

Q8. Newer languages like Mojo and Julia are being developed specifically for high-performance AI. How do you view this emerging competition, and what must Python do to maintain its leadership position and stay relevant for the next decade of technological advancement?

Guido van Rossum: Mojo is intended to *implement* high-performance AI “kernels”, which is a very exacting piece of classing computer optimization. It has no chance of replacing Python’s ecosystem — that’s just not what they are interested in. I don’t recall Julia being used for high-performance AI — it’s used for high-performance numerical computation, which can serve AI just as well as it can serve other demanding application domains.

Q9. Your role has evolved from Benevolent Dictator for Life (BDFL) to a distinguished engineer at Microsoft. How has this transition influenced your perspective on Python’s development, its community governance, and its place within the larger corporate tech ecosystem?

Guido van Rossum: It’s clearly a demotion. 🙂 I was BDFL until it was no longer possible for a single person to take on all the responsibilities of Python governance. I retired from my day job. I ended up at Microsoft because I realized I wasn’t ready to stop coding, and after Google and Dropbox (and with the ghost of Ballmer thoroughly expurgated) it seemed a good place to try and have some more fun coding.

Q10. Looking back at your incredible journey with Python and looking forward to an AI-driven future, what do you hope the ultimate legacy of Python will be? And on a personal level, how do you envision the craft of programming itself changing in the coming years?

Guido van Rossum: I am definitely not looking forward to an AI-driven future. I’m not worried about AI wanting to kill us all, but I see too many _people_ without ethics or morals getting enabled to do much more damage to society with less effort. The roots for that abuse have been laid by social media, though — another major computer paradigm shift that changed society but didn’t really affect the nature of software.

I hope that Python’s legacy will reflect its spirit of grassroots and worldwide collaboration based on equity and respect rather than power and money, and of enabling “the little guy” to code up dream projects.

………………………..………………

Guido van Rossum is the creator of the Python programming language. 

He grew up in the Netherlands and studied at the University of Amsterdam, where he graduated with a Master’s Degree in Mathematics and Computer Science. 

His first job after college was as a programmer at CWI, where he worked on the ABC language, the Amoeba distributed operating system, and a variety of multimedia projects. During this time he created Python as side project. He then moved to the United States to take a job at a non-profit research lab in Virginia, married a Texan, worked for several other startups, and moved to California. 

In 2005 he joined Google, where he obtained the rank of Senior Staff Engineer, and in 2013 he started working for Dropbox as a Principal Engineer. 

In October 2019 he retired. After a short retirement he joined Microsoft as Distinguished Engineer in 2020. Until 2018 he was Python’s BDFL (Benevolent Dictator For Life), and he is still deeply involved in the Python community. 

Guido and his family live in Silicon Valley, where they love hiking, biking and birding.

…………………………………….

Follow us on X

Follow us on LinkedIn

Sep 11 25

On Debugging with AI. Interview with Mark Williamson

by Roberto V. Zicari

“Quality of code (and everything that goes along with it) isn’t talked about enough in AI conversations!  There are some obvious facets to this – does the code do what you intended?  Is it fast?  Does it crash in the common cases?”

Q1. Can AI write better code than humans?

Mark Williamson: I don’t think so, at least not today.  For one thing, LLM-based AIs are trained on pre-existing code, which was written by fallible humans.  So they at least have the potential to make all the mistakes we do.

Despite that, any coding AI you pick will write better frontend Javascript than me – that’s not my area of expertise.  But I would back an experienced human (with or without AI assistance) to beat an unsupervised AI coder.

Can they beat humans some day?  I assume so – but they’re not doing it today.  And when you factor in other aspects of the Software Engineer’s job (such as building the right thing) it’s even more challenging.

Q2. How do you define what is a “better” code?

Mark Williamson: Quality of code (and everything that goes along with it) isn’t talked about enough in AI conversations!  There are some obvious facets to this – does the code do what you intended?  Is it fast?  Does it crash in the common cases?

A lot of the work a human developer does to achieve this is actually achieved after the initial code is typed in.  There’s an iterative process of learning about and refining the solution – understanding what you’ve made and improving on it.  A lot of this is really debugging, in the broadest sense of the term: the code doesn’t do what you expected and you need to understand and fix it.

There’s another step beyond that, though – whether the code fits its intended purpose.  Getting that fit requires understanding the end user, thinking through the implementation tradeoffs and anticipating future developments.  For now, I see AI as freeing up some time so we can create space for those human insights.

Just focusing on how many lines of code we create is a pattern in the industry – we overvalue simply generating code versus all the other things that software engineers actually do.

Q3. Can AI write some types of code faster and with fewer simple errors?

Mark Williamson: Yes!

In my experience, I’ve found AI to be extremely useful in three scenarios:

  • Writing code that is almost boilerplate – where it’s not a copy-paste problem but requires quite routine changes.
  • Writing code that would be boilerplate for a different engineer – e.g. if I want to write JSON serialisation / deserialisation code in Python it’s easier for me to get an AI assistant to show me the shape of a good solution.
  • Doing refactors that involve restructuring or applying a small fix in a lot of places – a coding agent can handle the detail while I concentrate on the overall shape.

In all these cases, the benefit is in reducing the amount of thinking required to figure out my design approach.  In Daniel Kahneman’s book Thinking Fast and Slow, he describes two modes of thought: System 1 and System 2.  System 1 is the stuff you can just answer automatically, whereas System 2 thought requires effort.

System 2 is tiring – you probably can’t manage more than a couple of hours of really hard thinking about code in a day.  So it’s precious.  An agent lets me offload some work so I can focus that effort on exploring solutions to the real problem I’m trying to solve.

Q4. Large Language Model (LLM)-based AI code assistants are powerful tools, but they have significant limitations that developers must understand. What are such limitations?

Mark Williamson: The most obvious limitation is that they don’t know everything.  They often act as though they do, which is a trap.  “Hallucinations” are the most well-known consequence of this – in which the LLM gives an answer that is confident but ultimately not based in fact.

I like to say that modern AI’s training teaches it what a good answer looks like – they’ve seen lots of examples of them, after all.  So, from an AI’s point of view, a good answer includes attributes like:

  • Projecting confidence.
  • Using the right terminology.
  • Relating suggestions specifically to your question and context.
  • Being right!

If it can satisfy most of those, then it’ll think it’s done a good job.  So when they’re asked a question and they lack facts, an AI will figure “3 out of 4 isn’t bad” and give a dangerously convincing answer that’s not based in reality.

There are two important things we can do to reduce this risk:

  • Supply high-quality context to the underlying model – the more relevant information available the better.  Supplying insufficient information invites the model to guess and supplying irrelevant information encourages it to head off on the wrong track.
  • Verify the model’s answers against a ground truth – run your tests, have experts review your code, verify the dynamic behaviour of the application matches what you expected.

You want to focus the model’s intelligence on solving the real problem (not on guessing), then know when it has actually solved it.

Q5. While LLM-based code assistants are incredibly powerful, there is critical information they lack that limits their effectiveness and makes human oversight essential. Why this?

What does it mean in practice?

Mark Williamson: As a CTO, I’ll divide my answer into two parts:

  • As an engineer, LLMs don’t know enough about your code to solve all the problems you wish they could solve.  They typically don’t have good knowledge of the runtime behaviour of the system, which makes incorrect answers more likely.  And they’re not good at inferring design intent, making it harder to fix subtle bugs correctly.
  • As a product manager, LLMs lack the insight into the true purpose of the software to be built.  You cannot rely on them to design the code to the needs of the end users, long term evolution / maintenance and business tradeoffs required.

Q6. LLMs are brilliant at static analysis—interpreting the text of a codebase, logs, and other documents. But they are blind to dynamic behavior. This is the critical information they lack and cannot get. Why? Do you have a solution for this problem?

Mark Williamson: Coding agents have a similar weakness to humans: they can’t see what the program really did at runtime and it’s hard to reason about why things happened.  They can get some of this from logs (and LLMs are really good at reading logs!) but logging can only capture so much.

There’s a catch 22 here for the developer: if you’d been able to predict precisely what logging you’d need to fix the bug you’re investigating, then you’d have known enough to avoid the bug in the first place.  There’s no reason to think that’s different for LLMs.

Coding agents can follow the same tedious loop that humans do: adding more logging to a codebase and running stuff again (or perhaps asking a human to obtain more logs some other way).

They can even do this toil more enthusiastically than any human! But the speed you gained from the agent may just disappear into a swamp of rebuilding, attempting to reproduce, finding what logging statements are still missing and then repeating the process.  This kind of inefficiency will be bad news for any Engineering department hoping to improve productivity in return for their AI spend.

Q7. It seems that time travel debugging (TTD) directly addresses this limitation. Please tell us more.

Mark Williamson: Time travel debugging captures a trace of everything a program does during execution.  The resulting recordings effectively represent the whole state of memory at every machine instruction the program executed.

Anything you want to know about the program’s runtime behaviour can then be queried from the recording, without needing to re-run or change the code.  Rare bugs become fully reproducible and any state can be explored in detail.  Moreover, the ability to rewind time makes it easy to explore why a bad state arose, not just what the state was.

Of course, storing all of memory at every point in execution time would be extremely inefficient!  A modern, scalable time travel debugger stores only information that flows into the program (initial memory state, IO from disk and network, system calls results, non-deterministic CPU instructions, etc).  This makes it possible to efficiently recompute all other state on demand.  Watch the talk “How do Time Travel Debuggers Work?” for the full details on how a modern time travel debugger is built.  

For an AI, this capability is ideal.  Remember that we need high-quality context to feed the model and a ground truth to make sure its answers are based in reality.  With time travel debugging, a coding agent has access to a recording of the program’s dynamic state and can drill down in detail on any suspicious behaviours – that gives us high-quality context.  The ground truth comes from the deterministic nature of the recording and also makes it possible to verify the AI’s findings.

These properties mean that AI coding agents get smarter when given access to a time travel debugging system.

Q8. You have released an add-on extension called explain, which integrates with your UDB debugger (part of the Undo Suite). What is it and what is it useful for?

Mark Williamson: Good question. Let me explain first what Undo is to set the context. It’s our time travel debugging technology (which runs on Linux x86 and ARM64) and is mostly used to debug complex enterprise software that makes use of advanced multithreading techniques, shared memory, direct device accesses, etc.

The Undo Suite captures precise recordings of unmodified programs using just-in-time binary instrumentation.The two main components of the Undo Suite are:

  • LiveRecorder – which captures program executions into portable recording files.
  • UDB – which provides a GDB-compatible interface to debug both live processes and recordings (but also integrates into IDEs such as VS Code).

The explain extension is our first step in integrating AI with a time travel debugging system.  It provides two pieces of functionality:

  • An MCP (Model Context Protocol) server – this exports the functionality of our UDB debugger for use by an AI agent, allowing it to integrate into existing AI workflows including agentic IDEs (such as VS Code with Copilot, Cursor or Windsurf).
  • The explain command itself, which provides additional tight integration with terminal-based coding agents (such as Claude Code, Amp and Codex CLI) where available.

In either case, we’re providing the power of time travel debugging to an AI, so that it can reason about the dynamic behaviour of a program.  As the name suggests, this extension has a particular focus on explaining program behaviour – how a given state arose, why the program crashed, etc.

We provide a carefully-designed set of tools to the agent so that it can answer these questions effectively. It’s important that the design of the MCP tools guides the actions to be taken by the LLM, otherwise it can easily get overwhelmed by the complexity.

In an agentic IDE you can connect to the MCP server in a running UDB session – then ask the agent questions (use the /explain prompt exported by the server for best results).  In UDB itself, you can just type the explain command and we’ll automatically invoke your preferred terminal coding agent and put it to work on your problem.

Q9.  Can you show us an example of how time traveling with an AI code assistant works in practice?

Mark Williamson: Sure! I’d recommend watching these two demo videos:

  1. The cache_calculate demo video on the Undo website which showcases how to use explain to get AI to tell you what has gone wrong in the program.
  2. This YouTube video where I use AI + time travel debugging to explore the codebase of the legendary Doom game and understand exactly what the program did when I played it.

We have additional demos, showcasing more advanced functionality, which aren’t yet public – you can book a personalised demo from https://undo.io/products/undo-ai/ to see the more advanced AI debugging functionality we’re currently building.

Qx. Anything else you wish to add?

Mark Williamson: The core message here is that AI-Augmented Software Engineers still need the right tools to do their jobs well.  Our goal is to make AI coding agents more effective at understanding and fixing complex code, improving the return on investment Engineering departments get on their AI stack.

The next big step for us will be designing a UX to be used by AIs instead of by humans.  Providing time travel debugging to a coding agent is already useful, but to get the best performance we need to work with what LLMs are good at.  In other words:

  • A query-like interface: rather than the statefulness of a debugger, LLMs are happiest when they can ask Big Questions and get a report in answer.  Our engine lets us extract detailed information very quickly from a recording so that an AI can start with an overview, then drill down.
  • Specialised, composable tools: a debugger provides quite general tools (stepping, breakpoints, etc) for a human developer to apply to any problem.  Coding agents can use these but we believe LLM intelligence is best spent on solving the core problem well, rather than diluting it on planning complex tool use.  A specialised set of analyses will allow the LLM to focus on what it’s good at – finding patterns and proposing fixes.

On top of these tools and the data contained within our recordings, we are building Undo AI – a product to enable agentic debugging at enterprise scale.  We’re currently taking applications for our pilot program, please get in touch to find out more at undo.io .

……………………………………………

Mark Williamson, Chief Technical Officer, Undo

After a few years as our Chief Software Architect, Mark is now acting as Undo’s CTO. Mark loves developing new technology and getting it to people who can benefit. He is a specialist in kernel-level, low-level Linux, embedded development with a wide experience in cross-disciplinary engineering.

In his previous role, his remit was to align the product’s architecture with the company’s needs, provide technical and design leadership, and lead internal quality work. One of his proudest achievements is his quest towards an all-green test suite!

As Undo’s CTO, Mark’s primary responsibility is to scale product-market fit and ensure we take our products in the right direction to meet the needs of a broader spectrum of customers.

Mark is also author on Medium, a conference speaker, and a new home owner enjoying the delights of emergency home repairs!

………………………..

Follow us on X

Follow us on LinkedIn